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In 1946, R. L. Anderson was asked to set up a course in mathematical 
statistics with an applied viewpoint for the Department of Experimental 
Statistics at North Carolina State College. Since no textbook of this 
type was then available, a set of notes was prepared in consultation with 
W. G. Cochran. These notes were mimeographed and later became a 
part of the Institute of Statistics Mimeo Series. This material borrowed 
heavily from the notes taken by the author from Professor Cochran at 
Iowa State College. About the same time, T. A. Bancroft was organiz¬ 
ing a set of notes similar to Part I, plus the material on regression in this 
text, at Iowa State College and in 1948 proposed that the two sets be 
amalgamated into a textbook. 

After numerous rewritings, we have settled on the present version of 
a combined textbook in mathematical statistics and a reference book for 
the research Avorker. This book has been divided into two parts. Part I 
presents basic statistical theory with some emphasis on research prob¬ 
lems; Part II presents the theory of least squares and its use in the analy¬ 
sis of experimental data. Bancroft had the primary responsibility of 
writing Part I and Anderson of Part II, although there have been frequent 
consultations between the authors to maintain reasonable continuity 
throughout the book. At North Carolina State College, this material 
has been used in three courses: (1) a one-year course in applied mathe¬ 
matical statistics, (2) some of the elementary parts of Chaps. 17 to 24 as 
supplementary material for a one-quarter course in design of experi¬ 
ments, and (3) the more advanced material in Part II for a one-quarter 
course in advanced experimental statistics. At IoAva State College, 
Part I plus the material on regression has been taught in a two-quarter 
theory course, and selected sections of Part II in a one-quarter advanced 


methods course. 

Many research workers have expressed a need for a convenient refer¬ 
ence book on statistical theory pointed to research problems, which 
could be used in conjunction with their books on general statistical 
methods, experimental design, and survey sampling. The authors have 
tried to write a book which Avould serve this purpose as well as that of a 
textbook in statistical theory. They realize the difficulties in such an 
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undertaking and will welcome suggestions on methods of improving this 
book both from a reference and from a text standpoint, without adding 
materially to the complexity of the material or the length of the book. 

If this book is used as a class text, the teacher should make a choice of 
the topics to be covered. It was not contemplated that all of the material 
could be covered in a one-year course. Points which must be con¬ 
sidered in deciding on topics to be taken up are number of lecture and 
laboratory hours available, interests of the students, previous training 
of the students (in mathematics, statistics, and experimentation), and 
purpose of the course (terminal or part of a sequence). A suggested list 
of chapters and sections for a one-year course in statistics is presented 
in the next section. We have postulated that only a good background 
in differential and integral calculus is required and that the student might 
use this book as an introduction to statistics. HoAvever, a previous or 
concurrent course in statistical methods is quite helpful. Several special¬ 
ized topics in matrix algebra and advanced calculus, necessary to an 
understanding of certain parts of the theory, have been included. Of 
course, previous work in these or other courses in mathematics would 
be helpful. 

. Perhaps the best method of studying the theoretical aspects of statis¬ 
tical techniques is to look at the examples first and then go back to the 
theory. However, the authors feel that it is best to develop a general 
theory first, since many research workers will have their own examples in 
mmd when they read this material, and the authors want the theory to 
be general enough to apply to these examples, not pointed at examples in 
the text. In lecturing on the material in this book, it may be advisable 
to present the theory and an example together. 

It should be noted that the material in Part II, plus the necessary 
background material in Part I, could very well form the basis for an 
introductory course in the theory and methods of experimental design. 
This course would include the omitted sections in the suggested outline 
for a first course in statistics, especially Chaps. 19, 23, and 24. 

As indicated earlier in this preface, the authors are indebted to Prof. 
W. G. Cochran for a majority of their early ideas in mathematical and 
applied statistics. It is understood that they are responsible for the 
interpretation and amplification of these ideas as presented in this book 
and for any mistakes in theory or application therein. They wish to 
express their appreciation to Profs. G. E. Nicholson and H. F. Smith, 
Avho made valuable suggestions for improving the presentation after 
teaching from the original mimeographed materials, and to Om Aggarwal 
for his help in A\ T orking through the examples and exercises in Part I. 
In addition they received many helpful comments from other staff mem- 
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bers and graduate students and from the reviewers of the original manu¬ 
script. Special thanks are due their typists, Mrs. Jeanne Rathz and 
Mrs. Margaret Kir win, who did yeoman service in interpreting the 

authors’ hieroglyphics. ,, ’ 

The authors are indebted to Prof. George W. Snedecor and the, Iowa 
State College Press for permission to use the many examples and tables 
taken from Statistical Methods. They are also indebted to Miss Catherine 
Thompson and Prof. Egon S. Pearson, editor of Biometrika, for permis¬ 
sion to include Table II in the Appendix, which is an abridged version 
of a table presented in Biometrika; and to Prof. Ronald A. Fisher, Cam¬ 
bridge, Dr. Frank Yates, Rothamsted, and Messrs. Oliver & Boyd, Ltd., 
Edinburgh and London, for permission to reprint Table III m the Appen¬ 
dix, which is an abridged version of Table III in their book Statistical 
Tables for Biological, Agricultural, and Medical Research. Finally they 
express their sincere appreciation for permission to use the various sets of 
experimental data included in their examples and exercises. 

R. L. Anderson 
T. A. Bancroft 

Raleigh, N. C. 

Ames, Iowa 
July, 1952 
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(Sec. 4.6). 
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5. Chapters 8 to 11 are concerned with the fundamental theory of estima¬ 
tion and tests of significance. The teacher may wish to discuss only 
the applications of these ideas without taking up all of the theoretical 
details. For example, it may be necessary to discuss the concept of 
power (Secs. 11.4 to 11.6) in a condensed manner and spend most of 
the time in Chap. 11 on the methods of setting up the likelihood-ratio 
criterion (Sec. 11.7). 

6. It may be necessary to delete the theoretical parts of Chap. 12 and all 
of Secs. 12.6 and 12.8. 


Part II 

1. Study all of Chap. 13, but only Sec. 14.1 of Chap. 14, unless matrix- 
algebra methods are desired. 

2. It may be desirable to teach only one method of computation in Chap. 
15 and omit either Sec. 15.2 or Sec. 15.3. 

3. All of Chap. 16 probably should be omitted except for a short assign¬ 
ment on curve fitting alone, with no theory. 

4. A selection of topics from Chaps. 17 to 25 should be made. The 
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(a) Chapter 17 is a discussion chapter and should be read, but the 
authors doubt that the student will understand much of it unless 
he has handled some experiments previously. 

(b) One or two examples from each type of design in Chap. 18 should 
suffice. 

(c) Omit Chap. 19 (incomplete-blocks designs). 
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20 . 6 . 
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(e) Possibly omit the theory for covariance (Chap. 21). Omit Secs. 
21.3 and 21.4. 

(/) Since the theory of variance components is quite complicated, it 
may be necessary to consider only a few simple problems. Omit 
Sec. 22.4. 
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understand Chap. 23 (the mixed model), but it might be instructive 
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erally want to use a mixed model. The split-plot design should be 
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( h ) Omit Chap. 24. 
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CHAPTER 1 

INTRODUCTION 


1.1. What Is Statistics? Dissatisfied with attempts at giving a 
precise formal definition of their subject, mathematicians have on occa¬ 
sion defined mathematics as being those professional activities engaged m 
by mathematicians. Later in this chapter an attempt will be made to 
give a more formal definition of the subject of statistics. For the present, 
however, if in the above definition the words “mathematics” and mathe¬ 
matician” are replaced by “statistics” and “statistician/ a definition 
of statistics might be written as follows: . .. 

Statistics comprises those professional activities engaged m hy statisticians 

In order to gain some insight into the nature of statistics, then, it 
remains to present in some detail the steps taken by a statistician m 
some simple yet representative investigation. # . . 

1.2. A Representative Investigation. The problem is: Is test materia 
B, which is the same as standard material A with one added chemical, a 

more deadly fly spray than A ? i , . . 

Step a. A hypothesis, sometimes referred to as a null hypothesis, is 
set up In this case a pertinent hypothesis, for which simple techniques 
for testing are available, is: Test material B, with the added chemical, is 
equally effective as standard material A in killing flies. . 

Step b An experiment is designed to test the hypothesis. Four 
batches of 100 randomly selected flies are each sprayed with each spray. 

Step c. Pertinent data are collected and tabulated. The number o 
flies killed in each batch are given in the following table: 


A 

Rank 

B 

Rank 

60 

...7. 

68 

3 

61 

6 

69 

2 

67 

4 

64 

5 

56 

8 

71 

1 

244 

25 

272 

11 


The numbers killed in the eight batches have been ranked from 1 to 8 
and the ranks summed for A and B separately in order to make use of the 
simple techniques for testing mentioned in step a. A low sum indicates 
a more effective kill. 
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Now, if B is actually more effective than A, we should expect the experi¬ 
ment to give evidence of this, that is, the sum of the ranks, S, for B would 
be expected to be less than that for A. The total of the eight ranks is 
1 + 2 + * ‘ * + 8 = 3 6; hence, the sum for A is 36 - S. The pos¬ 
sible values of S range from l + 2 + 3 + 4 = 10to5 + 6 + 7 + 8 = 26. 
In our experiment S == 11. A critical question is now in order: Does this 
low value of S, for this one experiment, indicate that B is more effective 
than A, or is there a high probability that one could obtain a value of 
S ~ 11 °r lower b y chance when B was actually no more effective than A ? 

Step d. The distribution of the data, on the assumption that the null 
hypothesis is true, is obtained. In order to answer the question posed in 
step c, we assume that the null hypothesis is true, that is, that A and B 
are equally effective fly killers, and find the probability, a, that such a low 
value (or a lower one) as £ = 11 could have been obtained under this 
assumption. If this probability is low, we reject the null hypothesis, 
knowing that there is a remote possibility (measured by a) that the null 
hypothesis is correct. 

To effect the above, we find the number of ways of obtaining the various 
possible values of S. We need to consider only one of the groups, in this 
instance the B group, since the rank numbers in the A group will be the 
four not used in B . Now, S = 10 may be obtained in only 1 way as the 
sum of 1, 2, 3, 4, while S = 12 may be obtained in 2 ways as the sum of 
either 1, 2, 3, 6 or 1, 2, 4, 5, etc. A table of such values is set out below: 


Sum of ranks, S 

10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 2 

Ways to form sums, N 

11235577877 5 5321 


A short-cut method of obtaining the various values of N is given by 
Wilcox on. 1 In the next chapter, a short-cut method will also be given 
for determining the number of ways of dividing 8 quantities into 2 groups 
of 4 each. For the present we simply note that EN = 70, where the 
symbol 2 is used to mean “the sum of.” 

Now, there are only 2 possibilities out of a total of 70 in which B could 
appear as effective as (or better than) it did in the performed experiment 
Or, stated another way, if A and B were equally effective, we could expect 
as good or better showing by B in only 2 out of 70 experiments on the 
average. 

Step e. A test of significance is performed. Usually we decide on an 
acceptable probability level, a, before the experiment is conducted, and 
it the results give a probability <<*, we state that the null hypothesis is 
rejected at the a significance level. Two commonly used values of a are 
0.05 and 0.01. If the null hypothesis is rejected at the a = 0.05 level the 
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results are said to be significant; if at the a = 0,01 level, highly 

In our experiment a = .03. Hence we might well reject the null hypoth- 
esis at the a = .03 probability level. In this case, if the null hypoth¬ 
esis was not rejected, we should be forced to accept the happening of an 

improbable event. < . 

Our experiment is not large enough ever to attain high significance, 
since even if every B kill was larger than the highest A kill, a could not be 
greater than Vtr = .014. In order to have an experiment sensitive enough 
to detect differences at the a = .01 level, more batches of flies must Ire 
used. In Chap. 7 we shall present a method of testing the null hypothesis 
which utilizes the actual number of flies killed. This so-called “t test, 
is more sensitive to the detection of real treatment differences than our 
ranking test. However, this new test requires the development of a more 
complex theoretical background. 

In tests like the t test the observations or sample values are specified as 
having been drawn from parent populations of known mathematical ioxm 
containing one or more unknown parameters. In such cases the null 
hypothesis is a statement concerning the parameter(s). R. A. Fisher 
has defined the basic theoretical problems underlying modern applied 
statistics as those of specification, distribution, estimation, and tests of 
hypotheses. The sample observations are specified as having been drawn 
from some parent population with unknown parameter (s). Functions of 
the sample observations are calculated as estimators of these parameters. 
The mathematical forms of the distributions of these functions obtained 
from repeated samplings are derived. Properties of estimators are 
investigated in order that an appropriate estimator may be used in an 
applied situation. Appropriate test criteria are constructed in order that 

valid tests of hypotheses may be made. 

The representative investigation described above is concerned with an 
experiment. A second large class of investigations involving the use of 
statistical methodologies is that of the sample survey. Either may lead to 
estimates of population parameters, the setting of confidence limits, or 
tests of significance. 

Close inspection of the steps taken by the statistician in the above 
representative investigation will reveal the fact that most are similar or 
identical with those taken by workers engaged in many different fields of 
scientific enquiry. What essential characteristic then is peculiar to the 
professional activities of the statistician? A careful comparison will 
reveal that this essential characteristic is the use of the mathematics of 
probability to calculate from the observations themselves a measure of 
the fallibility of conclusions and estimates. However, valid and efficient 
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measures of the fallibility of conclusions in terms of exact probability 
statements are possible only if the earlier steps in the investigation are 
taken with this end product in mind. Hence, the statistician finds him¬ 
self vitally concerned with matters not strictly concerned with this main 
aspect, such as statement of the null hypothesis, design of experiments 
and sample surveys, questionnaire construction and training and super¬ 
vision of enumerators, experimental techniques, collection and tabulation 
of data, specification of the parent population distribution, and inter¬ 
pretation of results. 

1.3. A Formal Definition of Statistics. With the above discussion 
in mind, statistics might be given the following formal definition: 

Statistics is the science and art of the development and application of the 
most effective methods of collecting , tabulating , and interpreting quantitative 
data in such a manner that the fallibility of conclusions and estimates may 
be assessed by means of inductive reasoning based on the mathematics of 
probability. 

Probability. It was stated in the definition that statistics uses 
inductive reasoning based on the mathematics of probability. In this 
respect, then, statistics is a branch of applied mathematics whose method¬ 
ologies stem from the axioms and theorems of probability, which in turn 
is a branch of pure mathematics. A definition of the probability of the 
occurrence of an event would appear to be in order. Unfortunately there 
is no general agreement among workers in the field as to what constitutes 
a satisfactory definition. For reasons of simplicity the classical definition 
will be given, which is as follows: 

If an event can occur in N equally likely and mutually exclusive ways , and 
if n of these ways have an attribute A, then the probability of the occurrence of 
A is n/N. 

This is the definition of a priori probability, that is, it assumes that it is 
possible logically to determine, before trials are made, all the equally 
likely and mutually exclusive ways that an event may happen and to 
assign n of these ways to the occurrence of attribute A. 

Example 1.1. What is the probability of obtaining a head with a 
penny on a single toss? Assuming the coin a “true” coin, we reason that 
it may fall 2 equally likely ways and that 1 of them must be heads; hence, 
the probability is 

Notice that the classical definition of probability, in using the words 
“ equally likely, ” assumes a knowledge of probability in order to define 
the term. Logically, of course, this is certainly undesirable, but a more 
satisfactory definition must await a higher level of mathematical maturity 
than that assumed for this text. 

In actual practice it would appear that a priori probability might have 
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limited usefulness. It may not be possible in a great many important 
scientific problems to determine logically, before trials are made, all the 
equally likely and mutually exclusive ways that an event may happen 
and to assign n of these ways to the occurrence of attribute A. For 
instance, even in the example, the penny may have a tendency to turn up 
heads more often than tails, and the probability of heads, then, is no 
longer i But what is the probability of heads now ? Suppose the penny 
is tossed 100 times and 55 heads are noted. The probability of a head 
might be tentatively set as .55. However, we have only an estimate 
based on 100 tosses of some unknown postulated “true” probability in 
a theoretical infinite population of throws. This estimate is called the 
empirical probability. 

What then is the connection between a priori probability and the sam¬ 
ple estimate of some unknown hypothetical population probability, that 
is, the empirical probabilityf With the a priori definition and certain 
postulates the mathematics of probability provides fundamental laws, or 
theorems of probability which in turn make possible the solution of many 
classes of problems. If the unknown hypothetical population probability 
and its sample estimates, the empirical probability , be assumed to be 
amenable to the same fundamental laws, then a means becomes available 
for solving many important problems in the empirical sciences. 

EXERCISES 

1.1. Using the last three observations for A and B in the data given in 
step c, test the same null hypothesis of step a. 

1.2. Add the two observations A (66), B (70) to the data of step c, and 

test the null hypothesis of step a. 

1.3. Follow the instructions of Exercise 1.2, adding only A (66). 

1.4. What is the a priori probability of obtaining a 7 with a pair of ordi¬ 
nary dice, if the dice are assumed to be “true”? Assuming that the pair 
of dice are not “true,” how could one obtain a reasonable estimate of the 
unknown probability of obtaining a 7, that is, the empirical probability? 

1.5. Read references cited 1 and 2. 
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CHAPTER 2 

PROBABILITY 


2 1 Introduction. It was stated earlier that statistics concerns itself 
with inductive reasoning based on the mathematics of probability in 
developing statistical methodology the statistician makes use of the 
definitions, postulates, and theorems of matheinatmaf probabihty J hat 
can be said of this framework, these “bones,” of statistics? The theory 
of probability had its genesis in the application of mathematics to deter¬ 
mining the odds in various games of chance: dice, cards, spun wheels, etc. 
In particular, the foundations of the science of probability were laid by two 
seventeenth-century mathematicians, Pascal and Fermat, in their pnva e 
correspondence concerning questions raised regarding the gamb mg 
observations of the French nobleman, Chevalier de MerA Books on 
games of chance are still being written by workers m probability and 

St StaScs is then no dry-as-dust subject concerning itself with the 
compilation of innumerable tables and charts. On the contrary, it deals 
with the development and application of an important methodology based 
on the fascinating subject of probability. This methodology has become 
of great importance as a research tool m the physical, biological, an 

social sciences. , . • ~ _ 

2.2. Number of Ways an Event Can Occur: Permutations and Coin 

binations. The number of ways in which an event may occur may be 
determined by enumeration or by the use of some simple rules from college 
algebra. The latter method is simpler for more complicated problems. 

Two fundamental theorems are: , , ,,, 

Rule 2 1 If A can happen in m ways and B in n ways independent (the 
occurrence of one does not affect the chance of the occurrence of the other) of m, 

then both A and B can happen in mn ways. 

Example 2.1. If two ordinary dice numbered 1 and 2 are tossed, one 
may appear face up in 6 ways which are independent of the 6 ways m 
which the second may appear. Hence, both may appear face up together 

in 36 different ways. t ^ 77 7 . 

Rule 2.2. If A can happen in m ways and B in n ways mutually exclusive 

{the occurrence of one precludes the occurrence of the other) of m, then either 

A or B can happen in m + n ways. 


9 
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Example 2.2. Either an ace or a king (one card) may be drawn from 
an ordinary deck of cards in 4 -f- 4 = 8 ways. 

• l" multiple arrangements, a rule for permutations can be applied. If 
it be desired to arrange n different objects into sets of r objects per set the 
number of such arrangements is called “the number of permutations’of n 
objects taken r at a time” and is indicated by P(n,r). The first of the r 
positions can be filled in « ways, the second in (» - 1) ways since one 
object will have been used in the first position, the third in (n - 2) wavs 
etc. Hence, by Rule 2.1: y ’ 

Rule 2.3. P(n,r) = n (n - l)(w - 2) • • • (n - r + 1) = nl 
where n\ = n(n — 1) • • • 2.1. ~ 

^ It should be noted that, when r = n, P(n,n ) = n!, which implies that 

. Example 2.3. The number of different ways of selecting a president 
vice-president, and secretary from a suggested slate of 6 is 

P(6,3) = 6 • 5 • 4 = 120. 

Suppose that all n objects in the arrangement are used, but certain 
groups n h n 2 , etc., are alike. Any rearrangement of the objects of any m 
group will not change any particular arrangement, hence, the total num¬ 
ber of arrangements will be less than if all the objects were different from 
one another. Now, any group of n, alike objects can be arranged ml 
ways, and since these arrangements are alike for every arrangement 

°. the other objects, the total number of different arrangements will be 
given as below: 

2.4. P(»;» I ,n, I n,, . . .) = where P{n;n h n,,n 3 , 


Rule 


ni\n 2 \nz\ • • 

. . .) represents the total number of permutations, given that Ui are alike 
n 2 alike but different from the first group, etc., and S m = n. 

Example 2.4. How many different 6-flag signals may be made if 3 

are red, 2 blue, and 1 yellow? P( 6:3,2 1) = - 6! - _ no 

3!2!1! 

If interest lies only in groups of objects and not in the arrangements 
within the groups, then combinatorial rules apply. The total number of 
combinations of n objects taken r at a time is denoted symbolically as 
C{n,r). It is easily seen that 

P(n,r) = C(n,r)P(r,r), 

since each combination of r objects may be permuted P(r,r ) times. The 
following rule is now derived: 

Rule 2.5. C(n,r) = ^4 =_ w! 

P(r,r) (w-r)!r! 
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Example 2.5. The total number of different bridge hands of 13 cards 
which can be dealt from a deck of 52 cards is C(52,13) = Again 

the total number of sets of 4 bridge hands is 

, 52! 

0(52,13) • 0(39,13) • 0(26,13) • 0(13,13) - ^gjyr 

2 3 Stirling’s Approximation. The use of the rules of permutations 
and combinations involves factorials, some with quite large values of n. 
Stirling's formula, 


,! = V27rn n n e- n ^1 + + 


1 


288n 2 


13 + 


■> 


may be used to obtain quickly an approximation to n ! The first term, 

n\ a/2it n n n e~ n , 

gives a suitable approximation in many cases. 

Example 2.6. Evaluate 13! by the use of Stirling s formula, 


log 13! 


131 — \/27r(13) |l3 13 e 13 1 + ^2)(13) ) 

, i (log 26 + log tt) + 13 log 13 - 13 log e + log xii, 
131 — 6.2271 X 10 9 , 


using 5-place logarithm table. . . 

2.4. Probability and Arrangements. After obtaining the tota 
number of mutually exclusive and equally likely ways and those which 
possess attribute A by the use of the rules of permutations and combina¬ 
tions, it is then possible to write the required probability by applying the 
fundamental definition 

number of ways that possess att ribute A 
P = ~ ” total number of ways 

Example 2.7. A bag contains 4 red and 3 white balls. What is ^h e 
probability of obtaining exactly 3 red balls when 3 balls are drawn? 


V 


_ C(4,3) _ ± 
C( 7,3) 35' 


2.6. Fundamental Laws of Probability 

Law 1. If A and B are two mutually exclusive events (the occurrence of 
one precludes the occurrence of the other), then the probability of either of 
them happening is the sum of the respective probabilities. Symbolically, 
<P (A + B) = <?{A) + <P (B). 
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Example 2.8. The probability of throwing either a 7 or an 8 with 
two dice is A + * = . 

Law 2. // A and B are two independent events , so the occurrence of 

tke chance °f t]lG occurrence of the other , the probability 
that both happen is the product of their respective probabilities . Symbolically 
<P(AB) = cp(4) • (P(£). ^ 

Example 2.9. The probability of getting 2 red balls in drawing 1 ball 
from each of two urns containing 6 red balls and 4 black balls is x % * = A- 

If A and B are not independent of one another so that the occurrence 
ot one affects the probability of the occurrence of the other, then a defi¬ 
nition is needed for the conditional probability of A given that B has 
occurred, which is denoted 9(A\B). Similarly the conditional probability 

of B given that A has happened is 9(B\A). In such cases fundamental 
law 2 becomes: 

. f aw 3 ; ^ AB ) = ■ O’dM) = <P (B) ■ 9(A\B). If A and B are 

independent , 9(B\A) = (9(B) and (P(A\B) = (9(A). 

Example 2.10. If both balls were drawn in succession from one of the 
urns m Example 2.9 without replacement, then the probability of obtain¬ 
ing 2 red balls is & • f = 

Law 4. If two events are not mutually exclusive, then the probability of 
at least one of them occurring is 9(A + L?) = (9(A) + (9(B) — (9(AB) 

Proof. Let A and B represent the nonoccurrence of A and B respec¬ 
tively. Then, J F 


9(A) + 9(B) = 9(AB) + 9(AB) + 9(BA) + 9(BA) 

= 9(AB) + 9 (A + B). 
+ B) = 9(A) + 9(B) ~ 9(AB), 

where 9(AB) is the probability of A occurring and B not occurring and 
similarly for 9(BA). ^ 

Law 4 is illustrated in Fig. 2.1, where the outcomes of a chance event 
are represented by points in a plane. Then the outcomes belonging to 
either A or B may be represented by the points in A and B less the points 
common to region AB since AB would be counted twice. 

This law may be extended, for example, 

9(A + B + C) 

= 9(A) + 9(B) + 9(C) ~ 9(AB) - 9(AC) - 9(BC) + 9(ABC). 

Example 2.11. In Example 2.9 the probability of obtaining at least 
one red ball is 


(P(A + B) = 9(A) + 9(B) - 9(AB) = * + *- *•* = U. 



PROBABILITY 


13 


Law 5. If the prohahility of an event occurring in a single trial is p, 
the probability of its occurring r times out of n trials is given by 

C(n,r)p r ( 1 - p) n ~ r = C(n,r)p r q n ~ r , 

w here 1 — p = q. This is the (r + l)st term of (q + p) n . 

Proof. If the event occurs r times out of n trials, it will fail to occur 
n - r times; hence, the probability of the occurrence of any sequence of r 
successes and n - r failures is - p) n ~ r - But the lumber of possible 
sequence is given by C(n,r). 

Example 2.12. The probability of obtaining exactly 3 heads on a 

single toss of 5 coins is (7(5,3) (•!■) 3 (4) 2 ~ ts - * 

2.6. A Posteriori Probability. In the previous sections it was assumed 
that the casual system was known a priori; hence, exact probabilities of 



various results were readily calculated. In tossing a die it was assumed 
that all 6 faces appeared “equally likely” and that a random toss of the 
die was made. In such cases the probability of obtaining any number 
from 1 to 6 was easily seen to be With these same assumptions and 
use of the fundamental laws of probability it was also easy to. state the 
probability of obtaining a 7 on a single throw, say, with two dice. 

In statistics, however, one is often faced with exactly the reverse of this 
situation. A batch of data resulting from some experiment is at hand, 
and we wish to state the probability that such data could have been pro¬ 
duced by a given casual system. For example, it is noted that two 
hundred 7 J s were obtained in tossing two dice 1,000 times. We now wish 
to know the probability that such a result could have been produced with 
unbiased dice. Or we may wish to state, on the basis of these results, the 
expected number of 7 J s to be obtained on the next 100 tosses of two dice. 
These problems concern a posteriori probability, probability based on 
previous occurrences. 

A posteriori probabilities, under certain conditions, may be obtained by 
the use of Bayes’s formula. Let B h B 2 , .\ . , B n be n mutually exclusive 
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random events, of which one is certain to occur. Let (P(£,•) be the proba¬ 
bility of the occurrence of B ( . Let E be an event which can occur only if 
one of the set of B’s occurs. Let S’(E\B { ) be the conditional probability 
for E to occur, assuming the occurrence of B { . We wish to know how the 
probability of B changes with the added information that E has actually 
happened. In other words, we wish the conditional probability <?(BAE). 
Using law 3, 

and 

<?(EBi) = <S>{E)(S>{Bi\E). 

Equating the right-hand sides of these two equations and solving: for 
CP (Bi\E), we obtain 

<P (Bi\E) = — 

Since E may occur with any of the B ( mutually exclusive events, 

<?(E) = (P(£Bj) + (P(EB t ) + • • • + <p (EB n ) 

= (PiBJViElBJ + <P{B t )<p(E\Bi) + ■ ■ ■ + (?{B n )(P(E\B n ). 

Upon substituting this last result in the denominator of the preceding 
equation, we obtain Bayes’s formula, 


<?(BAE) = _ _ (P(Bj)(P(E \Bi) 

• • • + (9{B n )(?{E\B~) 

If we consider the events B h B 2} . . . , B n as hypotheses to account for 
the occurrence of E, then Bayes’s formula provides a means of calculating 
probabilities of hypotheses. In this case (P(£i), <P(B 2 ), . . . , (P(B n ) are 
called a priori probabilities of the hypotheses B h B 2 , , B n , and 

(PiB^E), G>(B 2 \E), . . . , G>(B n \E) are called a posteriori probabilities of 
the same hypotheses. 

Example 2.13. Urn I contains 2 white, 1 black, and 3 red balls. 
Urn II contains 3 white, 2 black, and 4 red balls. Urn III contains 4 
white, 3 black, and 2 red balls. One urn is chosen at random, and 2 balls 
are drawn. They happen to be red and black. What is the probability 
that both balls came from urn I? urn II? urn III? 

We identify E as the event that the 2 balls were, respectively, red and 
black. To explain this occurrence, we have three hypotheses: the urn 
was I, II, or III. We identify these hypotheses with B h B 2 , B z , 

Then (P(£i) = <P(£ 2 ) = <P(£ 8 ) - £, and (P(^|Bi) = f • i -f * • f.^ i 
Similarly, (9(E\B^) = f, and (?(E\B Z ) = Substituting in Bayes’s 
formula, 


P{B,\E) 
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Similarly, 


(p(b 2 \e) 

<P(B t \E) 


_ _ i-i = 20 

j == 15 < 


Note, that the sum, as we should expect, is 1. 

Example 2.14. (Due to Neyman). Consider the cross of two hybrids 
Xl and x 2 which are heterozygous (Aa) and its progeny yi having the 
appearance of a dominant, that is, either doubly dominant (AA) or 
heterozygous (Aa). Suppose that yi is crossed with a recessive (aa) 
designated as y 2 resulting in n offspring: z h z 2 , . . . , z n . Suppose that 
not one of these offspring is a recessive, that is, either a dominant or 
a hybrid. It is proposed to find the probability that yi is (Aa). 

Let E be the event that n offspring have the appearance of dominants; 
Bi may be identified with the event that yi is (Aa) and B 2 with the event 
that yi is (AA). Then, <?(Bi) = (?(yi = Aa) = f, and 

(?(E\Bi) = <?(E\yi = Aa) - 1/2", 

and (P(E\B 2 ) = <P(E\yi = A A) = 1. Substituting in Bayes’s formula, 

, | • (1/2") 1 

(?(Bi\E) — (P(2/i — Aa\E) = _2 . (i/2 n ) -j- -g- • T 1 -j- 2 n ~~ 1 

Giving n the values 1, 2, 3, 4, 5, we obtain Table 2.1. 

Table 2.1 

A Posteriori Probabilities 

n CPGi = Aa\E) 

1 .500 

2 .333 

3 .200 

4 .111 

5 .059 

Suppose, however, that we do not know anything about the origin 
of yi. In that case the a priori probabilities (P(£i) = (P(*/i = Aa) and 
(p(B 2 ) = (P(yi = AA) would be unknown, and we would not have suffi¬ 
cient information to evaluate the right side of Bayes’s formula. In the 
past it has been suggested that since (P(B i) and (P(B 2 ) are unknown, and 
we have no reason to favor one more than another, we should assign one- 
half to each. Modern statistics provides other ways of attacking such 
problems which seem more reasonable. These methods will be discussed 
later. 
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EXERCISES 

2.1. An agronomist is designing an experiment involving the use of 4 
varieties, 3 fertilizers, and 3 spacings. How many different treatment 
combinations, using one from each of the three kinds of treatments, does 
he have? 

2.2. In how many different ways may a Jersey or a Holstein be 
drawn from a mixed herd of 5 Jerseys, 7 Holsteins, 10 Guernseys, and 6 
Brahmans ? 

2.3. How many different ways may a horticulturist arrange 5 different 
potted plants along a line on a greenhouse bench? 

2.4. How many different ways may a student select a major and a 
minor from 5 possible fields? 

2 . 6 . How many different arrangements can be made using the 10 letters 
from the word “statistics”? 

2.6. How many signals can be made by hoisting 6 flags of different 
colors if there are 6 significant positions on the flagpole? Any number of 
the flags may be hoisted at a time. 

2.7. An organism has the possibility of having 1, 2, 3, 4, or 5 out of a 
total of 15 characters. What are the total possible combinations? 

2.8. An industrial engineer is designing an experiment arranged to 
measure sources of variation from 4 factors (runs, journeys, cylinders, and 
pots). If we let I?, J , C, and P represent the respective factors, how many 
2-factor interactions of the type RJ, etc., are there? How many 3-factor 
interactions? How many 4-factor interactions? 

2.9. Using the relationship 

(1 + x) n 

— 1 ~}~ C{n,l)x -f- C(n,2)x 2 + * * * + C(n, n — l)x n ~ 1 -f- C(n,n)x n , 
show that 

2 n — 1 = C(n, 1) -f- C(n, 2) -j- • • • -f- C(n, n — 1) -f- C(n,n). 

How many ways can we make a selection of 5 breeds of chickens, taking 
some or all? 

2.10. Show that C(n,r) = C(n, n — r). 

2.11. If C(n,10) = C(n, 6), find C(n, 3). 

2.12. If C(16,r) = (7(16, r — 2), find r. 

2.13. If P(56, r + 6)/P(54, r + 3) = 30,800, find r. 

2.14. A random sample of size n from a finite population of N sampling 
units is one in which every possible combination of size n has an equal 
chance of being chosen. How many different samples of size 10 may be 
drawn from a list of 100 names? Use Stirling’s approximation to evaluate 
the factorials. 
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2 15. Suppose that in selecting the sample of size 10 m Exercise 2.14 
we draw a number from 1 to 10 at random, say 6, and select every tenth 
name on our list thereafter, that is, 16, 26, etc. Is this method of selec¬ 
tion equivalent to random sampling? Why? 

2.16. From a pack of 52 cards, 2 are drawn at random; find the proba¬ 
bility that one is a queen and the other a king. 

2.17. There are three events A, B, C, one of which must, and only one 
can happen; the probability of A not happening is A, and the probability 
of B not happening is f. Find the probability of C happening. 

2.18. The probability of A solving a certain problem is T , and the prob¬ 
ability of B solving the same problem is A- What is the probability that 

the problem will be solved if both try? . 

2 19. In a family with 6 children, what is the probability that (a) all 
children will be girls; ( b ) all children will be of the same sex; (c) the first 5 
children will be boys and the sixth a girl? (d) That 3 of the children will 

be boys? Assume the sex ratio is ■*. 

2.20. Show that under the conditions of law 5 the probability that an 

event happens at least r times in n trials is 

C(n,r) V rq«-r + C (n, r + l)p*+V ’"’ 1 + * * * + n t P " 

or the sum of the last (n - r + 1) terms of the expansion of (q + p) n - 

2.21. A lady declares that by tasting a cup of tea made with milk she 
can discriminate whether the milk or the tea infusion was first added to 
the cup. Eight cups of tea were mixed, 4 in one way and 4 in the other, 
and the lady was so informed. The cups were then presented, m random 
order, to her for judgment. She was asked to divide the 8 cups into two 
sets of 4, agreeing, if possible, with the treatments received. The lady 
selected 3 right and 1 wrong in each set of the same treatment. On the 
assumption that the lady cannot discriminate between the two methods, 
show that the probability of her doing as well or better by chance is to- 

2.22. An urn contains 6 balls which are known to be all red or 4 red and 
2 black. A ball is drawn and found to be red. What is the probability 
that all the balls are red? 

2.23. A male rat is either doubly dominant (AA) or heterozygous (Aa), 
and, owing to Mendelian properties, the probabilities of either being true 
is 4 ? The male rat is bred to a doubly recessive (aa) female. If the male 
rat is doubly dominant, the offspring will exhibit the dominant character¬ 
istic; if heterozygous, the offspring will exhibit the dominant character¬ 
istic 4 of the time and the recessive characteristic 4 of the time. Suppose 
all of 3 offspring exhibit the dominant characteristic, what is the prob¬ 
ability that the male is doubly dominant? 

2.24. Chevalier de M6r6’s problem was concerned with a certain game 
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of dice. Twenty-four throws of a pair of dice is to be allowed, and the 
player is permitted to bet even money either on the occurrence or at least 
one “ double six” in the course of the 24 throws or against it. Certain 
theoretical considerations led De Mere to believe that betting on the 
double six is advantageous. On the other hand his empirical trials 
appeared to contradict this conclusion. Pascal's solution stated that, if 
the dice are fair,” the probability of obtaining at least one double six 
in 24 throws is .491. Check Pascal’s results. 
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CHAPTER 3 

UNIVARIATE PARENT POPULATION DISTRIBUTIONS 

3.1. Specification. In Chap. 2, we were interested in obtaining the 
probability of the occurrence of a single chance event. It will be recalled 
from Chap. 1 that in order to arrive at the end product, a test of a hypo- 

thesis in the representative statistical investigation, we nee e t e pro a 

bilities for a complete set of chance events. In that investigation the 
chance events were various sums of ranks, S. A table of the Possible 
values which a chance event may assume with a corresponding probability 
for each value is called a probability distribution for the parent population. 
This distribution is given in Table 3.1 for the parent population of values 
of S. In this table, the variable x is used to represent the chance event 6 
and in such cases is called a change variable or variate. 

Table 3.1 

Probability Distribution of Sum of Ranks , 

x 10 11 12 13 14 15 16 17 18 19 20 21 22 23 _ 

( X ) tT. A A TT A A A A A A TO A A A A A tt> 

Ordinarily, in applied statistics, specification is accomplished by select¬ 
ing a mathematical function, for example, the normal, binomial, or 
Poisson, on the basis of theoretical or empirical evidence and stating that 

the observations form a sample of all possible values of the variate 

Quoting from R. A. Fisherd " ... we may know by experience what 
forms are likely to be suitable, and the adequacy of our choice may be 
tested a posteriori. We must confine ourselves to those forms which we 
know how to handle, or for which any tables which may be necessary have 

been constructed.” T , . , , „„ 

3 2 Discrete Distributions. Functions like f(x) m Table 3.1 are 
called discrete probability distribution functions to distinguish distribu¬ 
tions of this type from continuous probability distribution functions, to 
be discussed later. The various values of f(x) may be thought of as 
giving the relative frequencies of occurrence corresponding to the particular 
values of x. Since some one of the 17 events must occur on any one trial, 
the sum of all the probabilities is 1 or symbolically 
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1 K x) = 


19 
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The distinguishing characteristic of the discrete distribution is that the 

numbers ^ *“* ”* “ TaWe 3J the 

tribut'inn h of Bin0mial DiStribution ' This discrete distribution is the dis¬ 
tribution of successes, a, m n repeated independent trials, in which the 

probability of success on any trial is a constant p. It has been named the 

binomial distribution because the successive proLbilities are “by th 

respective terms of the expansion of the binomial ( s +> where 

l ~ 0ne pro P. ert y of th e binomial theorem is that the (* + l) st 
term of the expansion is J 

f(x) = C(n,x)p^q^, 0 < x < n, (2) 

which by the methods of Chap, 2 also gives the probability of exactly x 
successes m n trials. On the right side of (2), * is the variate, and p 



2 /(*) = (? + p)" = 1, 

x = 0 

this distribution fulfills the requirement that the sum of the probabilities 

n and , P n rtiaI SUmS fOT Various numeri oal values of p, 

«, and * for the binomial distribution are given in reference 2 

Sample 3.1 Given that the probability of drawing a tenant farm 

resLct mP 6 °l f t r r + S 18 T ,' H Samples ° f 5 farms are drawn ' then the 
respective probabilities of obtaining 0, I, 2, 3, 4, 5 tenant farms in a 
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single sample are (I) 5 ; 5(*)(f)‘; 10(*)*(t)'; 10(i) 3 (l) 2 ; 50) 4 (f)i (tV or 

^The probabilities fix) of obtaining (x = 0,1,2,3,4,5) tenant farms may 

be shown graphically as in Fig. 3.1. . , o t n pn 

If the probabilities are accumulated and graphed as m fig. 6.Z, tne 
some F(a) value gives the probability of obtaining a value of * less t^an 
or equal to a, that is, F(a) = S>0 < a). This step function is called a 

cumulative distribution or an ogive. 

Note that points of discontinuity occur for each whole number on the 
x axis and that F( 5) = <9(x < 5) =1. 


Fix) 


270 

240 

210 

180 

150 

120 

90 

60 

30 

0 




1 2 3 4 5 

Fig. 3.2. Cumulative distribution of tenant-farm probabilities. [F(x) axis given in 
parts of 243.] 

3.4. The Poisson Distribution. Another discrete distribution of 
importance in applied statistics is the Poisson distribution. The Poisson 
distribution may be derived as a limiting form of the binonual distribu¬ 
tion when p is very small but n is so large that np is a finite constant, equal 
to m, say. To see this, consider the binomial distribution: 


n(n - 1)0 - 2) 
f{x) = ■-—- 


(n — x + 1) 


',p x q J 


Since p = m/n, 

n(n -- 1 )(n 


/O) 


2) • • • (n — x + 1) 


x\ 




mV 

n ) 


(1 - l/n)(l - 2/n) 


[1 — (a: — l )/n]m*jl - m/n) n 1 
x\ 


m x e~ 


lim fix) = —\ 


Then 
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Qbviously this function is also a function of re; hence the Poisson distribu- 
tion may be written as 

f(x) = e~ m 0 < x < oo. 

Since 


then 



:r = 0 


^ f(x) = 1. 

E = 0 


• v° n , Wa , S named for Poison, having been given first by him 

m 1837. Individual terms and partial sums for various numerical values 

Molina® * f ° r ^ P ° iS80n distribution have been made available by 

It should be noted that the Poisson distribution is a one-parameter 
lamily, m being the parameter. 

Example 3.2. A bag of clover seed is known to contain 1 per cent 
weed seeds. A sample of 100 seeds is drawn. Since 


m = np = lOO(.Ol) = 1 

and e-i = .3679, the probabilities of 0, 1, 2, 3, ... , weed seeds being 
m the sample are & 

Number of weed seeds 1 0 1 2 3 4 - _ 


Probability 1.3879 .3679 .1839 .0613 .0153 .0031 .0005 .0001 

3.5. Continuous Distributions. If measurements instead of counts 
constitute the data under consideration, then the hypothetical parent 
population distribution is usually that of a continuous variate instead of 
a discrete variate. Snedecor 4 gives the histogram of Fig. 3.3 for the gain s 
m weight of 100 swine. Before powerful mathematical methods may 
be applied to derive a methodology providing techniques for statistical 
m erences, it is desirable to “idealize” the histogram into a curve which 
may be represented by a mathematical function. Such a process takes 
place m other branches of applied mathematics, for example, in survey- 
m g . Before the surveyor can be furnished with a powerful methodology 
for the solution of his practical problems in mensuration, it is necessary 
or the geometer to idealize the physical points, lines, and planes. A 
geometrical point is defined as having no dimensions but simply an 
indicator of position. Again, a geometrical line has no width, and a 
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e T xpress r e°d b i ility * * ^ ^ t0 ° r leSS than some «>"*«* a is 

— a ) = /_ M /(*) Ax. 

Again, the probability of x lying between a and b is given by 

<P(a < x < b) = J'' J(x) dx. (tjj 

It is possible to omit the equal signs in the left sides of (4) and (5) since 
the probability of obtaining any particular value of z is equal to the width 
of a geometrical line which is zero. 


fu) 



x x + dx 

Fig. 3.4. “Idealized,” or theoretical, probability distribution. 
Fix) 



° a b 

Fig. 3.5. Cumulative probability distribution for a continuous variate 

% 7tZZ 

-J*. The N ° r “ a | DiSMbuti0n ' The mo «t important continuous dis- 
of £T« m a P pbed statlstlcs is the normal distribution. The histogram 
f F -f a ' 3 T d the theoretlcal distribution of Fig. 3.4 are those for data 
specified as being normally distributed. Data arising from many differ¬ 
ent measurements taken on plants and animals are specified Ts MoSr 
the normal distribution. There is empirical justification 
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tion. Similarly, distributions of certain data of the physical and social 
sciences are found to be satisfactorily represented by the normal distribu¬ 
tion . It should n ot be assumed, however, that every continuous distribu¬ 
tion representing actual data should be normal. For example, it is known 
that the distribution of sizes of cumulus clouds should be represented by a 
U-shaped curve. The mathematical form of the normal curve is defined 

by 

f(x) dx = —t= dx, - oo < x < °°. (6) 

cr \/2t 

The curve is symmetrical about x = u and bell-shaped as in Fig. 3.4. , The 
inflection points are at x = u ± <r, and the tails of the curve, although 
approaching the x axis quite rapidly, extend indefinitely far in both 
directions. The function (6) represents a two-parameter family, m and <r, 
of continuous distributions, that is, as /* and <r vary m magnitude a family 
of distributions is generated. 

Since it can be shown for (6) that 


/_*! dx = ^ 

then the normal probability distribution has this same property in com¬ 
mon with the binomial and the Poisson distributions. 

Table I in the Appendix gives ordinates and areas for the normal 

distribution. 

3.7. Probability Distributions as Specialized Mathematical Functions. 

We have noticed that theoretical probability distributions are mathe¬ 
matical functions possessing certain requirements. In order to give a 
complete formal definition of the requirements necessary for a mathe¬ 
matical function to be a probability distribution of statistics, it is con¬ 
venient and sufficient to consider the cumulative distribution function, 
Fix). It is sufficient since, given the cumulative distribution, it is possi¬ 
ble to find the distribution itself by taking the differential, that is, 


d[F(x)] - f(x) dx. 


A mathematical function, F(x), may be used as a cumulative distribu¬ 
tion of a chance variable provided that 

(a) F(— oo) =0, F(+ 00 ) = 1? ' 

\b) F(x) is a nondecreasing function, that is, if xi > x 2 , F(x i) >F{x 2 ), 
( c ) Fix) is defined at every point in a continuous range and is con¬ 
tinuous, except possibly at a denumerable number of points. 

The following notation should be kept clearly in mind. 


fix) is the frequency function. 

F(^) is the cumulative distribution. 
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3.8. Some Mathematical Functions Useful as Probability Distributions. 

Karl Pearson 5 - 6 has suggested the differential equation 

dy _ (d — x)y 
dx a + bx + cx 2 

as a generator of possible parent population distributions useful in applied 
statistics. For example, if d = n,b = c = 0, and a = <r 2 , then the differ¬ 
ential equation becomes 

dy _ Qu — x) dx 

y o- 2 

Solving for y, we have 

log, „ = - + loge c . 

Then 

y = gg—(*— h) 2 /2<t 

Upon setting the integral between the limits from - co to °o equal to 1 
we find ’ 

1 

c — -=• 

cr \/2r 

Hence, 

V = —_ co < x < co 

cr V 27 T 

which is the normal frequency function. This is Type VII of the Pearson 
system of frequency functions. 6 

Another method of obtaining a mathematical representation of a fre¬ 
quency function is furnished by the Gram-Charlier series. 6 This latter 
method will not be discussed here, but the interested reader should con¬ 
sult the reference. 

3.9. The Gamma and Beta Functions. These are two useful functions 
in statistics of which extensive use will be made in subsequent chapters. 
The Gamma function of the positive number n is defined by 

r O) = f 0 °° X^e-* dx , n > 0. 

The properties of the Gamma function will be exhibited in Exercises 

3.1 to 3.6. 

The incomplete Gamma function, defined by 

F(x) = I x {n) = f x 11 -^-* dx, 

r(w) Jo 

_ T x (n) 
r (n)’ 


0 < X < CO, 
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furnishes a useful cumulative distribution function which will be discussed 
in Chap. 7. 

The Beta function, defined by 


B («,n) = j 0 ' *-'(1 - *W l dx > m ’ n>0 ’ 

i, of importance in theoretical and applied ■*«». «. property 

Of the Beta function will be exhibited m Exercises 3.7 to 3.9. 

The cumulative incomplete Beta function is defined y 

, a J / \ 1 f — . 71 N ) n— 1 dx , 0 ^ x 5; lj 

- B g (m,n) 

B (m,n) 

Both I in) and have been tabulated by Karl Pearson and lus 

t.hl Piometric Laboratory, University College, London. 


EXERCISES 

o 1 TTqp integration by parts to show that Y(n + 1) = ^(n). # 

3.1. Use miegiauuu j f _ . _ # . (n -k)T(n - fc), where fcis 

3.2. Show that r (n + 1) — n ( n ^ 

a positive integer less than n. ^ = , 

3 3 If n is also a positive integer, show that T(n + ) 

r(n) becomes infinite when n 0. 

3.5. Show that r(») = 2 / fl ” dy by setting * = V* m * 6 

integral defining the Gamma function 

3.6. Using the result of Exercise 3.5, show that. 

(а) rft) =2 / 0 ” dy, 

( б ) [raw -if 0 "J 0 ‘*-**”**&> 

w = 4 J o ' /2 Jy e _r V dr dB in polar coordinates, 

SS.By srtthj* = sin 2 6 in the integral defining B («,»), show that 

B(m,n) = 2 sin 2 -- 1 0 cos 2 "- 1 B dB. 

3.8. By setting * = 1 - y in the integral defining B(«,») show that 
B (m,n) = B(w,m). 
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3.9. Show that 

(а) r(n)V(m) = 4 y^ e - ix ^ dx ^ 

(б) T(n)F(m) = 4 jj /2 sin 2 ”*- 1 e cos 2 ”- 1 $ de f ” r 2 (»+»)-i e -r= dr 

(c) T{n)T{m) = B{m,n)T{m + n), or ° 


n{m,n) = 




3.10. Construct a parent population probability distribution for the 
sum of numbers appearing when two dice are tossed. Give (a) a table of 

1 rH /( l■ ®; sr * ph »'<?■ “«<*»"» 

i' ‘ 2011 have a se * 01 rand om numbers from 0 to 909, how would 

farms th P f a Sam P lmg S0heme t0 select a random sample of 50 from 490 
farms, the farms being numbered 0 to 489? 

tiofw”*T: “5™*”“ in « <« *. Ml diotribu. 

tion^h*’ ln E ” d “ 310 ,or *> p »— 

3.14. Given the frequency function/(*) = 2s, 0 < * < 1, f( x ) = 0 f or 

* < 0 or * > !. Show that f g l f( x ) dx = 1. What is the functional 
form of the cumulative distribution function? 

f»L“o“ ,he und “ th ‘'™ '* 1 *■“ “”t«w d»«. 


f(x) 


2(6 - x) 


0 < x < 6. 


3.16. Repeat Exercise 3.15 for the rectangular distribution 


/(*) =?> 0 < z < w . 


3.17. Repeat Exercise 3.15 for the Cauchy distribution 

~ 00 < X < 00. 


fix) = 1 , 


7T 1 + 

3.18. A random variable x, which lies between the limits 0 and 10 has 
he frequency functron f(x) = Ax’. Determine the value of A so that 

Ind 5 * Iw 7 18 k iS the P robabilit y tba * * lies between 2 

ana 5 f that x is less than 3? 

3.19. A random variable follows the normal distribution. Determine 
the coordmates of the maximum point on the frequency curve, and show 
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that n ± a are the inflexion points. Show that the area under the normal 
curve is 1. 

3.20. A variate has the distribution/^) = e~ x in the interval 0 < x < °o . 
The probability is that x will exceed what value? 


3.21. If <?(x < xi) = 1 


1 


x being a continuous variate with 


1 + x\ 

range 0 < x < find the frequency function/(T). 

3.22. Ma y f(x) = — \/{x — 2) 2 , 0 < x < 4, serve as a probability dis¬ 
tribution function? Why? 

3.23. Use the table of areas of the normal curve to determine, for the 
normal distribution given in Exercise 3.19, the probability of (a) x )> g, 
(b) (fx — a) < x < (/j, + <r), (c) x > n + 2<r. 

3.24. Give the Poisson approximation to the binomial distribution with 
n = 2,048 and p = 1/1,024. Hence, obtain the probabilities of there 
being 0, 1, 2, 3, . . . times that 10 tails appear in 2,048 tosses of 10 coins. 

3.25. It can be shown for large n that the binomial distribution may be 
approximated by the normal distribution with /i = np and a 2 = npq. If 
20 coins are tossed, obtain the approximate probability of obtaining 8 or 
more heads. 

3.26. For the distribution f(x) = 2x, 0 < x < 1, find the number a 
such that the probability of x > a is 3 times the probability of x < a. 

3.27. If two values of x are drawn at random from the distribution 
f(x) = e~ x , 0 < x < oo } what is the probability that both are greater 
than 1? 

3.28. In Exercise 3.27 what is the probability that at least one value of 
x is greater than 1 ? 
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CHAPTER 4 


PROPERTIES OF UNIVARIATE DISTRIBUTION FUNCTIONS 

4.1. Introduction. In this chapter certain important properties of 
parent population distributions will be discussed. The discussion will also 
apply to similar properties of derived sampling distributions . . These 
properties will be found useful in describing parent population distribu¬ 
tions and derived sampling distributions. 

4.2. Mathematical Expectation. The mathematical expectation ol 
any random variable x, which can assume values x h X 2 with 

n 

probabilities, p h • • • , V respectively, where £ Vi = h defined 

1 

to be 

n 

E(x') —■ ^ XiPi, 

i~l 

For the discrete distribution this becomes 

n 

E(x) = ^ Xif(xi), 

i — 1 

for distributions with Vi = f(x t ). For the continuous distribution 

E(x) = xf(x) dx, 

where as noted before/(x) may vanish over part of the range (— 00 , + 00 ) • 
The term “mathematical expectation” may be shortened to “expected 
value” or “average value.” 

The definition of the expected value of any random variable x may be 
generalized to include functions of x . The expected value of any func¬ 
tion of x, say 0(x), is 

E[6(x)] = 26(x)f(x) or f$(x)f(x) dx, 

over the range of x and depending upon whether a? is a discrete or con¬ 
tinuous variate. . , 

It is possible, by introducing a generalized form of summation called a 
“Stieltje’s integral,” to replace the 2 or J by a single integral sign. 
Although the concept of the Stieltje’s integral simplifies statements con- 

31 
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cerning probabilities and expected values, the mathematical concepts 
behind this refinement are beyond the scope of this text. A distinction 
in all statements will be made between the discrete and continuous 
distributions. 

If the indicated integration, necessary to obtain the expected value, 
cannot be performed directly, various methods of numerical integration 
are available. 

Example 4.1. The expected value of x, where x gives the various sums 
possible to be made by throwing two ordinary dice, may be found from 
the following table: 


i 

1 

2 3 4 5 

6 7 

8 

9 

10 

ll 

x% 

2 

3 4 5 6 

7 8 

9 

10 

11 

12 

36 f(xi) 

1 

2 3 4 5 

6 5 

4 

3 

2 

1 

Then, 


11 






E{x) 

__ 

V fYr • l _ 2 + 6 + 12 + . 

• • 

+ 12 

= 7. 



/ ^ V 

4 = 1 

36 




Example 4.2. 

If f(x) = 3x 2 , 0 < X 

< 1, then 






E(x) = f 1 x ■ 3x 2 dx = r 
jo 4 

Also, 

B(x 2 ) = f 1 x 2 • Sx 2 dx = f • 

jo 5 

4.3. Operations with Expected Values. The rules stated below will 
be found useful in operating with expected values. 

1* The expected value of a constant is the constant itself. E(c) = c. 
. The expected value of a constant times a variable is the constant 
times the expected value of the variable. E[c6{x)] = cE[d(x)]. 

. 3 ’ The ex Pected value of a sum (or difference) of two variates or func¬ 
tions is the sum (or difference) of the expected values of the separate parts 
E[0i(x) ± e*(x)] = Eld, Or)] + E[9 2 (x)l 
The proof of these statements is left as an exercise for the student. 

4.4. Moments. The expected value of is called the kth moment of x 
about the origin and is represented by the symbol g'. Hence, 

ix f k = E(x k ) = Xx k f(x) or fx k f(x) dx, 

over the range of x. The first moment of x about the origin is referred 
to as the mean of x, and is denoted by To simplify writing let jj,[ = g. 
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The expected value of (x - fx) k is called the A th moment of x about the 
mean and is designated by the symbol Hence 

tx k = E( x - p)* = 2(x - fx) k f(x) or J(x - n) k f(x) dx, 
over the range of x. It is easily seen that 

m = E(x — jj) = 0. 

The second moment about the mean, fx 2 , is called the variance and is 
usually designated by the symbol a 2 . Hence, 

cr 2 = = E{x - IX ) 2 = 2(x - /x) 2 f(x) or f(x - fx) 2 f(x) dx, 

over the range of x. The formula for cr 2 may be written as follows. 
cr 2 =» E(x - p) 2 = E{x 2 ) - 2ftE(x) + M 2 = “ M 2 = ^ “ W 2 - 

The square root of a 2 , or a, is referred to as the standard deviation. 

The third moment about the mean, /x 3 , furnishes a measure of skewness, 
or departure from symmetry about the mean of the distribution. One of 
the most generally accepted measures of skewness is the nondimensional 
quantity 


It is seen that <* 3 will be zero for a symmetric distribution. 

A measure of the relative flatness or peakedness of the distribution, 
called the kurtosis, is given by the nondimensional quantity 


Oi4 


fU 

<T 4 * 


Example 4.3. Using the table of values of Example 4.1, we see that 
fx = 7. Also, 

t 4 + 18"+ 48 + • • * + 144 _ 1, 974 

fx 2 = ■ 36......""36 

Hence, 


1,974 


49 = 


35 


Example 4.4. Using the distribution function of Example 4.2, that is, 
f(x) = 3a: 2 , 0 < x < 1, 


and 


_ Jo (f i) ' 3x * dx 160' 
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Example 4.5. In order that the function, 

f(x) dx = ke~ c dx, - °o < x < oo, 
represent a probability distribution, it is necessary that 

f(x) dx = 1. 


Hence, we find k = \/c/t. 

Also, 


P = j xf{x) dx — m, 

and 

0-2 = (* - nYJ{x) dx, 


Substituting these values in the original function, we find 

f(x) dx = — dX) - oo < x < 00 

(X V27T 


2 c' 


which is the normal distribution. 

Example 4.6. For the binomial distribution, 

f(x) = C(n,x)p x q n ~ x } 

X x T n-x)\x\ ™~ 

(n — 1)! 

= (to — a ;)\(x — 1) 1 P x ~ 1 1 n ~ !t 

X=1 

= np(q + p ) n ~ l . 
p = np. 

4.5. Moment Generating Functions. The expected value of e tx often 
provides a convenient short cut in evaluating the moments of x. Since 



and designate m(t) the moment generating function of x. If m(t) be differ¬ 
entiated k times with respect to t and then evaluated at t = 0, we note 
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that 


d k m(t) 

dt k 


35 


= Mfc- 


To obtain the moment generating function of x - n, we consider 


. t(x - m ) , Hx - m ) 2 , t s ( x - 
e ( = 1 +-fr~ +-21- + 3! 


+ 


Then, 


1=1 


= 1 + X ''Uf 

If we set M(t) = E[e lt - X -^}, then 

d k M(t) 


ilk ~ 


dt k 


i-0 


In general the moment generating function of any function d(x) may be 
defined as 

E[e t0(x) ]> 

Example 4.7. For the distribution, 

f(x) dx = e~ x dx, 0 < x < &, 

oo 

m{t) = f°° e~ a ~ ,)x dx = (1 - O" 1 = 2 tK 

J 1 = 0 

Hence, 

A == ^ 

Example 4.8. For the binomial distribution 

n n 

m (t) = ^ e?*C(n,x)p x qr-* = ^ C(n,x)(pe‘) x q n - X = (q + pe?) n . 


x — 0 


£ = 0 


Then, 

fi[ = ii = npe l {q + pe*)* 1 - 1 


t =o 


= np, 


/ 2 = np[6*(w ~ l)(g + pety-tpe* + (q + p#)*- 1 #] J^ 0 = np[(n - l)p + 1]. 

pr pri np 

Ma = <r 2 = m' 2 - (m) 2 = np( 1 - p) = np?. 

We may obtain the following relation between M(t) and m(f): 

M(t) = = e-**{E{<f*)) 

M(t ) = g-^mit). 
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Besides furnishing a simple method of obtaining the moments in certain 
cases, the moment generating function is of use in deriving distribution 
functions and in comparing distributions. These latter two uses follow 
from Theorems 4.1 and 4.2 of Sec. 4.7. 

The expected value of may not exist for real values of t for many 
discrete and continuous distributions, for example, for the Cauchy distri- 
bution 

f(x) dx — ~ - —-i — oo < x < oo . 

A more general function, which can be proved always to exist, is the 
characteristic function defined as E(e itx ), where t is real. The character¬ 
istic function for the Cauchy distribution is However, the evalua¬ 

tion of the integral, necessary to obtain the characteristic function, makes 
use of more advanced mathematical methods than are assumed for this 
course. . Our uses of Theorems 4.1 and 4.2 will be confined to the moment 
generating function, which will be assumed to exist in such cases. 

4.6. Cumulants. Suppose that we define 


log m(t) = KOO - Kl t + k 2 ~j + • • . + Kr * + . . . 

zi * 

But 

log m(t) = tp + log Mil) 

and 

i 0 g M(t) = log ri + y 

i =l 1 ' - 

Hence 


a) 


log m(f) = + log 1 + (/^ + 


log (1 + x) = X - ^ + ix 


log m(t) — tn + 


( ** 

V 2 2! 


, t 3 

4- Msgj + 


1 t* 

2 l 0| "T ^3 + 


vt + M2 2] + M3 + (M4 - 3, 0 - + • • • . (2) 

Hence, equating coefficients of like powers of t in (1) and (2), we find 

Ki = M, *2 = m 2 , *3 = Ms, «4 = M4 - 3m1, etc. The function K(t) is called 
the cumulant generating function. 
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Example 4.9. For the normal distribution 


/(*) - 


0—0—jt) 2 / 2cr 2 fa, — CO < X < 00, 


(T \/ 2ir 

m(<) = ^ ^ 

Let a? = p + y, then 

m(t) = -4= 

(T V ^ 7r 

« eM ___ f °° e -(y-^*)V2a 2 e ^<rV2 ^ 

<7 \/!27T ' “ w 

.*. m(0 = e*M+c*'**/2). 

Now in order to read off the moments, we need the expansion of m(t) m 
series, but this is not very simple. However, the cumulants may be 
found quite simply, since for this case 


K(£) = log m(t) = tp + 


£V 2 


But 


K(£) = Kit + k 2 7p\ "b K3 ql 


Hence, for the normal distribution 


It is important to remember that all cumulants after and including m 
for the normal distribution are zero. 

Hence, for the normal distribution, it is simpler to read the cumulants, 
for the cumulant generating function, K(£). If the moments are 
desired, they may be obtained easily from the cumulants. Hence the 
use of either the moment generating function or the cumulant generating 
function depends on the form of the distribution function. . 

4.7. An Inverse Problem. It was seen that, if we are given the 
theoretical distribution, then we may obtain a set of moments 0 h ^ 

p! ). In applied work we may have a large sample of observations 

and’wish to determine from the data some evidence regarding an appro¬ 
priate theoretical function to represent the assumed parent population 

distribution. . + 

From the table of values giving the empirical frequency distribution it 

would be possible to obtain sample moments for the large sample. If it 
be assumed that these sample moments are “good” estimates of the cor¬ 
responding moments of some theoretical distribution, then we have,, the, 
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inverse problem of determining uniquely a theoretical distribution having 
given the moments. This problem is discussed in texts on advanced 
mathematical statistics and is beyond the scope of this course. The 

Pearson system of curves” is an assumed set of continuous functions 
whose parameters may be expressed in terms of the moments. Hence, 
estimates of these parameters may be obtained from a large sample and 
some one of the theoretical curves “fitted” to the empirical frequency 
distribution. A test of “goodness of fit” may then be accomplished by 
the use of chi-square (see Chap. 12 ). 

Closely related to the moment problem mentioned above is the inverse 
relation of the characteristic function or the moment generating function 
to a possible corresponding distribution function. Two theorems from 
advanced theoretical statistics will be stated without proof and made use 
of in subsequent derivations. 

Theorem 4.1. A distribution function is uniquely determined by its 
characteristic function or, where it exists, the moment generating function. 

Theorem 4.2. . If a distribution function has a characteristic function 
{moment generating function) which approaches the characteristic function 
{moment generating function) of another distribution, the two distributions 
approach each other. 

EXERCISES 

4.1. Find the mathematical expectation for the following distribution: 


y 

10 

20 

30 

40 

V 

.1 

.3 

.5 

.1 


4.2. Given the following probability laws, find y, o - 2 for each: 

(a) f(x) = 10a: 9 , 0 < * < 1, 

(&) f{x) = s/50, 0 < 3 < 10. 

4.3. A random variable can assume only two values 1 and 2 . Its 
mathematical expectation is f. Find p± and p 2 . 

4.4. A random variable has the distribution function f{x) — A - j- Bx, 

0 < 3 < 1 . The mathematical expectation is £. Find the constants 
A and B. 

4.5. Express y% and y 4 in terms of moments about zero. 

4.6. Use the cumulants for the normal distribution to determine the 
first four moments about the mean. 

4.7. Two other measures of skewness and kurtosis, or departure from 
normality, are 71 = k 3 /fe) 1 and 72 = k±/k\. Show that 71 = as and 
72 = oc 4 - 3. Find 71 and 72 for the normal distribution, 


f 
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4.8. Determine the first four cumulants for the binomial distribution. 
Verify that = pq(d« r /dp),r > 1 , for the cumulants obtained. 

4 9. Show that E(x - x 1 ) i is a minimum if - E(x). 

IlO If x has the distribution function f(x) = i on the interval (0,z) 

and 1 — . a.n.,.«n e taction of *. »d th, v.rnm.e 

of X This is called a rectangular distribution. , , 

4.11. An unbiased penny is tossed 64 times Fmd (a) the expected 
number of heads, ( b ) the theoretical standard deviation. 

4 12 A pair of dice is thrown 60 times. Find (a) the expected number 
of times that the sum 10 appears, (b) the expected value of the square o 

th ® S 1 t 3 dn ^ r e d re de J r 1 e t 6° urns containing, respectively, 1 white, 9 black; 
2 white 8 black; 3 white, 7 black; 4 white, 6 black; 6 white, 5 bkc 
4 white 6 black balls, dne ball is to be drawn from each urn What 
is the expected number of white balls taken? Let * be a friable which 
assumes the values 1 or 0 according as to whether the tnal F 

success or failure. Then, « = * + *. + ‘ ' '_+ *» “ ^ '"I ° 
successes in n trials. But Efa) - p* • 1 + (1 PO ^ 

2 n Hence E(m) = pi + P* + * * * + Pn * . „ , 

’ 4 14. ' An urn contains a red balls and b black balls, c bafle are drawn 

simultaneously. What is the expected number ° are 

4 15 An urn contains r tickets numbered from 1 to r and » tickets are 
drawn at a time. What is the expected sum of the numbers on the tickets 
drawn ^ Let * be the variable attached to the ith ticket which may 
assume any of the values 1, 2, . . • , r - Then, 


E(x 


-G) 


(1 + 2 +---+ 1 -)- 

Set m = *i + *2 + ' • ■ + and complete the solution by finding 

Find the moment generating function m(t) for the triangular 
distribution sketched in the accompanying figure. 

/<*> 
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wiaiigiuar distribution 


equals me moment generating function 
Exercise 4.16. 

fnrMo PUU f ° r tte binomial distribution using the formulas 

or the definitions of these moments. Note: x 2 = x(x — 1) -f x 


x * = x(x - l)(x - 2) + 3s 2 


2x, 


an f f a A ~ l ^ x ~ 2 X* - 3) + W - Ux* + 6®. 

7on . " 3 and “ 4 for the binomial distribution 

/L g J II j 2 ^ 

as, and a t for the following binomials: (£ -f |) 7 ; 


4.20. Find M , a 
(i + l) 18 . 


JSLSZvl ,or ,he p — *• 

4? HnT: 1 "**'”/'! 1 '? 3, ' - *: ^ or Poisson distribution. 

t M4 f ° r the Poisson distribution, (a) using the for 

““ ® "»"* «Ke 

4.24. Find /i 2j oj 8 , and for the Poisson distribution. 

.25. Prove the general formula connecting the moments about u with 
the moments about the origin: M Wlth 

ph = n k — &/Z/4-1 + ~ mV*_2 — • • • 

Use the formula to obtain 


Mi = 0 , 

M2 = / 2 — (m) 2 , 

= /4 3 — 3/4/4 “b 2( m ) 8 , 

M4 = - 4/4/4 + 60*)Va - 3( M )h 



CHAPTER 5 


BIVARIATE AND MULTIVARIATE DISTRIBUTIONS AND THEIR 

PROPERTIES 

5.1. Introduction. In the previous chapters single-variate, or uni¬ 
variate, distributions and their properties have been discussed. It is 
proposed now to extend the discussion to cases of two or more variates, 
that is, bivariate and multivariate distributions. The discussion will apply 
alike to bivariate and multivariate parent population distributions and 
bivariate and multivariate derived sampling distributions. The latter 
distributions will be discussed in Chap. 6 . 

5.2. Discrete Bivariate Distributions. Suppose that for every value 
of a given variate, x, we also know the values which a second variate, y, 
can take. Then it will be possible to construct a joint probability dis¬ 
tribution, from which can be obtained the probability that any combina¬ 
tion of x and y will occur in random draws. The bivariate frequency 
function will be represented symbolically by f(x,y). The conditions 
required for a mathematical function F(x,y) to be used as a cumulative 
joint probability distribution are analogous to those in Sec. 3.7 for the uni¬ 
variate case. These conditions imply the following conditions for f(x,y) : 

(а) f(x,y) is nonnegative over the (x,y) plane. 

( б ) = 1 , where W is the entire (x,y) region. 

( c ) ^ f( x ,y) can be computed for any subregion, w, of W. 

Example 5.1. Consider the two-dice problem. Let a; represent the 
number of spots showing on die 1 and y the number on die 2 in any one 
toss of the two dice. The j oint probability distribution is given by Table 
5,1 (p = ^). The frequency function is f(x,y) = it for an y P air of 
values, (x,y), for x or y = 1, 2, . . . ,6. Note that Xf(x,y) = 1 over the 
ranges of both variates, 1 < * < 6 , 1 < y < 6 . The probability that * 
and y lie, at the same time, in ranges a<x<b,c<y<d is given by 

<P(w) = ^/(£,y), W 

w 

where w is the subregion defined by a <x<b, c<y<d. In the two- 

41 
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dice problem 

<P(1 < * < 3, 2 < y < 4) = * = l 

We may define a cumulative distribution function for the bivariate case 
in a manner similar to that for <P(w) in (1) above. If F(b,d) represents the 
probability that x < b, y < d, then, 

F (b,d) = ^/(*,y), (2) 

W 

where w is the subregion (x < b, y < d). For the two-dice problem 

W) = (A) = l 

If the subregion w is allowed to assume all possible values, the x and y 
will assume all possible pairs of values and (2) in three dimensions becomes 
analogous to the two-dimension step function of Sec. 3.3 and may be 
written as 

F(x,y) = ( 3 ) 

W 

over the range of values of the subregion w. The function defined by (3) 
is the cumulative bivariate distribution function. 


Table 5.1 

Joint Probability Distribution in Two-dice Problem 


X 

y \s 

1 

2 3 

4 5 6 

Total 

i 

P 

P 

P 

6 p 

2 

3 

V 

P 

P 

6 p 

4 





5 


. 



6 

P 

P 

P 

Qp 

Total 

6 p 

6 p 

. . 6 p 

36p 


Suppose we extend this process. Add all of the probabilities for a given 
value of x, say «i; then the range of values of the subregion w is simply the 
linear range of y } and hence we may write 

Y, K Xi <y) = s( x i), 

y 

which is the probability that x = Xl . If this be thought of as being done 
tor all values of x, then we may write the frequency function of * as 

ff(x) = YfM, 


(4) 
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which is called the marginal distribution of x. Similarly, the marginal 
distribution of y is 

h(y) = ^/(*,!/)• ® 

X 

As can be seen from the border totals in Table 5.1, the distributions (4) 
and (5) may be exhibited as in Table 5.2. 

Table 5.2 

Marginal Distributions of x and y in Two-dice Problem 


x = y 

1 

2 

3 

4 

5 

6 

g{x) = h(y) 

i 

l 

*§■ 

i 


i 

TS 

l 

'S’ 


Finally, the marginal distribution and the joint distribution may be 
used to define the conditional distribution, corresponding to the conditional 
probability discussed in Sec. 2.5. Using the notation f(y\x) to mean the 
probability of y, given x, we know that 

f{y\x) = f(x,y)/g(x), g(x)^0, (6) 

since 

f(x,y) = g(x)f(y\x) 

by law 3 of Sec. 2.5. The distribution (6) is called the conditional dis¬ 
tribution of y. Similarly, the conditional distribution of x is 

f(x\ii) = f{x,y)/h{y), h(y) ^ 0. (7) 

Now, if f(y\x) does not depend on x, then y and x are said to be inde- 
pendent variates, since 

f(x,y) = * Ky)- ( 8 ) 

This is true for the two-dice problem since for any x = Xi , f(y\xi) = i- 
Hence, by (8), 

f{x,y) = i i = 3Sj 


for x, y = 1, 2, . . . , 6. 

5.3. Continuous Bivariate Distribution. 

tinuous variates, x and y, the probability that 
and y in the interval (y, y + dy) is 


In the case of two con- 
re is in the interval (rc, rc + dx) 


f(x,y) dx dy. 


(9) 


The graph of z = f(x,y) is called the frequency surface. The frequency 
function f{x,y) is nonnegative, but it may be zero over certain subregions 
of the (x } y) plane. 
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The probability that will fall in some subregion w 

W is given by 

of the ( x,y ) plane 


3> ( w ) = / / f(.x,y) dx dy, 

(10) 

since 




/ / f(x,y) dx dy = 1. 

W 

(11) 

The cumulative distribution function is given by 



F (%,y) = J_ * f(x,y) dy dx. 

(12) 


The marginal distribution of x is given by 


g(x) = yy f(x,y) dy, 

and similarly for h(y ). 

. FinaI V the conditional probability that y lies in the interval (y, y + dy), 
given that x is in the interval (x, x + tlx), is 


f(y\x ) dy = /Vy) d V dx = }(%,y) dy 
g(x) dx q(x) 


(13) 


Again, if f(y\x) is independent of x, that is, if the right side of (13) does not 
contain x after algebraic simplification, then x and y are said to be inde- 
pendent variates and 


f( x ,y) = g{x) • h(y). 

Example 5.2. Given f(x,y) dx dy = e~ x ~v dx dy, (x, y > 0). 

(°) F( ' x > y ) = Jo fn e - * - " dy dx. For x = xi = 1 , y = Vl = 1 , 


P{xi,yd) = <5>{x, y < 1) = (1 - e~ x i) (1 - e~».) = 

(b) g(x) = j" e~ x ~« dy = e~ x . 

( c ) f(y\%) = r = e~ y (independent of x). 

Example 5.3. The normal bivariate distribution with means of x 
and y both zero is 


) = .3996. 


f(x,y) dy dx 


1 

2w<T x CTy \/\ — p 2 


e 


_J__ f . Ui. _ 2pxy 1 
2(l-p*)Ltr B 2 WvSdydx, 


where —co < x < co ; — oo < y < oo. 
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(°) g( - x} = f{x ’ y) dy = 


-a;V2«T x 2. 


ffay) _ 




i r v__££~ 1 2 
2(1 — p 2 ) L o' a <TaJ. 


If p = 0 , 


f(y\x) = 


a y \/2 k 


e -vV2«r»* = 


hence, in to case, x and y are independent. Hence, p may be used as a 
measure of relationship between x and y. It is, in fact, the population 
correlation coefficient between x and y. 

5.4. Distribution of Functions of Discrete Variates. In order to obtain 
certain properties of bivariate and multivariate discrete or continuous 
distributions, such as expected values, moments, and moment generating 
functions, it is sometimes necessary to make transformations of the 
variates so that the summations or integrals may be evaluated. The 
discrete case will be considered first. 

The distribution of a function of x } say z = \p(x), given the distribution 
of x, is simple if there is a one-to-one correspondence between x and z } 
that is, if for every value of x there is only one value of z, and vice versa. 
In this case, the same probabilities hold for z as for x. For example, con¬ 
sider a single die with/O) = i(x == 1,2, . . . , 6 ). Suppose z = z 2 . In 
general, there is not a one-to-one correspondence between x and z, because 
x — + \/Zj resulting in two values of x for each z. However, in our die 
problem, x must be positive; hence x = + V z only. Therefore, 

f(z) = |, for z = 1, 4, . . . , 36. _ 

On the other hand, supposes = (x - 1)0 - 2 ),orx = (3 ± y/l + 4s)/2. 
Then, even for x always positive, there is not a one-to-one correspondence 
between x and z, since, say, for z = 0, x = 1 or 2. Hence, in this case 

for z = 0 , for two integral values of x in the range 1 < x < 6 . 
for z = 2 , 6 , 12 , 20 , for one integral value of x in the range 
1 < x < 6. 

elsewhere, no integral value of x in the range 1 < x < 6 . 



Again, \i z ~ (x — \){x — 2) • * • O “ 6 ), 

M for z = 0, for six values of x in the range 1 < x < 6 . 
” I 0 elsewhere, no values of x in the range 1 < x < 6 . 


If we consider a bivariate distribution, such as that of the two-dice 
problem, the distribution of a function of the two variates is still simple. 
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For example, consider the distribution/^), where w = xy; then 

/(i) =/( 1,1) = A. 

/(2) = /( 2,1) +/(1,2) = A. 

/( 3 ) = /(3,l) +/(1,3) = A- 

/(4) = /(4,l) + /(2,2) + /(1,4) = A. 

m = o. 


/(36) = /(6,6) = A. 

Here again Xf(w) = 1 over the possible range. 

5.5. Distributions of Functions of Continuous Variates. The dis¬ 
tribution of x is f(x) dx, x defined in the range xx to x 2 . We seek the 
distribution of z = z(x). If there is a one-to-one correspondence between 
x an d 2 and x can be solved uniquely in terms of z, then x = \ p(z), 
dx = i p'(z) dz, f(x) = f[\p(z)], and the limits are zx = z(x i), 2 2 = z(x 2 ). The 
probability distribution of 2 , with these conditions, will hef[f(z)W(z)] dz 
over the range Zx to g 2 . 

Example 5.4. If f(x) dx = 2(1 — x) dx for 0 < x < 1 , we find the 
distribution of z, where z = x 2 , as follows: 

Since z is always positive^ there is a one-to-one correspondence between 
z and x, that is, x = + \/z; hence 

f(z) dz = 2(1 - \/~z) — dz, 

2\/z 

or 

f(z) dz = ( 2 -* - 1) dz, 0 < 2 < 1. 

Example 5.5. If f(x) dx — —-— dx, —1 < x < 1 , then x = ± \/z 

for z — x 2 . For positive values of x, x = + and for negative values 
of x, x = — \/ 2 . In this case 

(| ~~ 1) dz, x > 0 ) 

f(z) dz = F >0 < 2 < 1. 

(4 (z~* + 1) dz, x < 0\ 

If we wish, we may add the two functions to obtain a single function 
f(z) dz = £ 2 -* dz, 0 < 2 < 1, 
but this is not possible in all cases. 

The distribution of a function of two continuous variates, x and y, is 
more complex mathematically. We wish to derive the simultaneous dis- 
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tribution of u and v, f(u,v ) du dv , where u = and t; = v{x,y). In 

case we wish only the distribution of u, then v is integrated out between 
its limits of integration, leaving some f(u) du. In such cases it is neces¬ 
sary in general to assume a one-to-one correspondence between (x,y) and 
( UjV ). It will then be possible to make the inverse solution: 

X = x(u,v), and y = y(u,v). 

The probability distribution of f(x,y) dx dy then becomes 
f[x(u,v), y(u,v)]\J\ du dv, 

where J is the Jacobian of the transformation, that is, 


dx 

dy 


1 du 

1? 

du 

du 

i 

dx 

dy 



or 1 



dx 

dy 


dv 

dv 

dv 

dv 


dx 

dy 


This implies, of course, that these partial derivatives exist. If the second 
form is used, then J must be evaluated at x = x(u,v), y = y(u,v). 

The limits for u and v must be determined individually for each prob¬ 
lem. The limits for the first variable in the integral may be functions of 
the second variable. 

Example 5.6. Let us find f(u,v) du dv, where f(x,y) dx dy = e x y dx dy, 
o < a: < oo , o <y< 0° , U = X + y, V = x/y . 

Then, 

UV U 

x = y - 

and 

v 1 

i + ~v i + v 

—u 

J = ~ (!+»)*’ 

u _ - u 

(1 + v) 2 (1 T v) 2 

or 

1 1 

1 _ = _ ( x + y) _ _ (I + v) 2 _ 

7 ~ , " y 2 u 

l _ x_ 

y y 2 

:.f{u,v) du dv = du dv • 
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Since, if u is fixed, v can assume any value, then the limits of integration 
for u and v are 0 <w<qo ) 0 <i;<oo. 

To find the distribution of u, we have 


f(u) du = 
Similarly, 




v)‘- 


dv 


du = ue~ u du, 0 < u < 


f(v) dv = 


1 


(1 + * 0 2 


dv, 0 < v < oo . 


Note that u and v are independent in this case. 
Example 5.7. Find f(u,v) du dv, where 


f(x,y) dx dy = 


(i - e y 


dx dy, 


0 < x < y, 0 < y < 1, u = x + y, and v 
We find easily dx dy = | du dv, and hence 


x ~ y. 


f(x,y) dx dy = g __ ^ du dv. 

Howe\ er, the assignment of limits for u and v in this case is more involved. 
First, we plot as in Fig. 5.1 the region on the (x,y) plane with the following 
boundaries: x = y, y = 1 , and x = 0 . 


V 



Fig. 5.1. Regions of integration for Example 5.7. 


Since x (u v)/2,y — (u v)/2, then the boundary x == y becomes 
v = 0; y = 1 becomes u — v = 2; and x = 0 becomes u = —v. We 
now plot the new region determined by these new boundaries on the (u,v) 
plane as in Fig. 5.1. In the new region, u varies from — v to v + 2, and 
v varies from 0 to — 1 . Hence the new limits are — v < u < v + 2 
-1 < v < 0 . 

5.6. Expected Values for Bivariate Distributions. For any bivariate 
frequency function, f(x,y), the expected value of any function of x and y, 
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say ^(x,y), is 

E[t(x,y)] = yy +(x,y)J(x,y) or f f Hx,y)f(x,y) dx dy, 

IT ^ 

where W is the entire region of (x,y). The following specializations of 
xP(x,y) will enable us to derive some simple rules for operating with 
expected values. Continuous variates will be used in the derivations, 
but similar rules will hold for discrete variates. 

(1) E(c) = c, where c is a constant. 

(2) E(cx) = c f ^ £ [ f"" f(x,y) dy ] dx = c f_ ^ xg{x) dx = cE{x). 

(3) E(x + y) = f" x x [ JL f( x ’V ) d v\ dx 

+ lNy[l-J ( - x ’ y)dx ] dy 

= xg(x) dx + JL yh{v) d V = E(x) + E{y). 

(4) E(xy) = a [ | y yf(x,y) dy ] dx = J_ ^ xhx{x) dx, 

where 

hiix) = j“ x yf{x,y) dy. 

Note that E(xy) can be evaluated, if the integrations can be performed, 
even though x and y are not independent. 

(5) If x and y are independent, then f(x,y) = g{x) ■ h(y) and hence 

E(xy) = JL xg(x) dx yh(y) dy = E(x)E(y). 

5.7. Moments. The product moment x'y* about the origin is given by 

M '„ = E{x r f) = / y X’y’f{x,y) dx dy. 

Let the mean of x be y 1Q and the mean of y be MoL then 

Hr. = E[(x - AoYly - A :)■] = IN ( x ~ 6 * Y(y - m'oi) s /(*, 2 /) dxdy. 

Example 5.8. To find a 2 x , defined as equal to M 20 , we have 

M20 = E(x Mio ) 2 = M20 (Mio) • 

cr| = M20 (Mio) 2 - 

(Ty = M 02 = Mo 2 “■ (Mol) 2 - 


Similarly, 
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Example 5.9. To find <r xy , defined as equal to mu and called the 
covariance of x and y, we have 

Ml. = E[(x - Mi„) (y - Mi x )J - E(xy) - / 10 / 01 . 

. _ ! ! ! 

• • — Mu MioMoi- 

If * and y are independent, E(xy) = M ' u = hence <r*„ = 0 for this 

condition. The correlation, p xy , between x and y is defined as the non- 
dimensional quantity 


It can be shown that -1 < p < 1 . Also, if ^ = 0, then Pxy = 0. 
Example 5.10. To find the variance of (x + y ), we have 

ff «+y = ~ mIo + y - Mod 2 = *1 + + 2 <7 xy 

= a l + ^ + 2p<r a; cr 2/ . 

If x and y are independent, that is, a xy = 0 , then a 2 x+y = + a 2 . 

Example 5.11. To find E{xy) for the bivariate normal distribution 
with p ' 10 = 0 , y' Q1 = 0 , we have ’ 


where 


E{xy) = * X ye-« dx dy, 


and 


d = 


k = 


1 


27 X(J X (J y \/l ~ p 2 


2(1 - p 2 ) 


t. 2 £ 2 _ 2 p x V ~\ 


Gx&y J 


Note that for the bivariate normal, Pxy has been abbreviated to p. 
The function d may be written 


6 = 


2 P xy p 2 y 2 


2(1 - p 2 ) |_ cr* a x a y 


+ 


+ 


2o-2 


t 2 


1 — p l 


4- 


where t = — — —. 


< t x 


Then, using the methods of Sec. 5 . 5 , 


E{xy) -*/j> [ f " dt 

L J — °° 


e -vV2a y 2 fiy 


p(?x 
\/ 27T 


J -- oj 


dy = pa x <T y . 
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Example 5.12. Consider 

f(x t y) dx dy = er x ~ y dx dy f 0<£<°°,0<?/<oo. 

Then, 

E(x) = m!o = e~' J [ j“ Xe ~ x dx \ d y = 1, 

E(y) — Moi ~ I? 

f _ r _ 9 

M20 “ Mo2 ~ 'V 

Mil = J g “ xe~* dx IT ye-« dy = 1, 

0"xy == Mu — M 10 M 01 == Oj 

and 

Pxy = 0. 

Example 5.13. Consider the following discrete bivariate distribution 
with p — 


y\ 

X 

0 

1 

2 

Total 

0 

2 V 

2p 

2 V 

6 p 

1 

V 

4 p 

V 

6 p 

Total 

3 V 

6 p 

3p 

I2p 


Then, 


0 

II 

5 

- 

IT 

fe¬ 

ll 

bi 

2 1 

M 01 = Yl yffoy) = 

00 

9 1 

& L 

M 20 = ^ x 2 f(x,y) = .5, 

0 0 

Mo 2 = ^0,2/) : 

0 0 

<rj = .5 - .25 = .25, 

<rj = 1.5 - 1.0 = .5, 

II 

o|Xl- 

fe¬ 

ll 

fe 

0 ^ = .5 — .5 = 0, 


and 


Note that we cannot state that x and y are independent in this case 
even though p = 0, since f(x\y) is not the same for all values of y. It 
can be seen that/(a? 12 / = 0 ) = |, i m ,J(x\y = 1 ) = i, I; and 


f(x\y = 2 ) = f, i. 

5.8. Moment and Cumulant Generating Functions. For the bivariate 
case, the moment generating function about the origin is defined to be 
m(t x ,ty ) = E(e xtx+Vh ). The moment generating function about p ' l0 and 
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Moi is defined as 

M{t x ,t y ) = E(e T ) = ffe T f(x,y) dx dy, 

where T = (x — y' 1G )t x + (y — yu' 01 )4 and the double integration is 
performed over the ranges of x and y. 

Then, 

oo 

M(k,ty) - £ H„ g, 

r,s = 0 

and 

Note that y G o = 1 , jutoi = yio = 0 . 

The cumulant generating function in this case is given by 

oo 

K = log m(t x ,Q = log M(t x ,ty) + ^ k„ ^ 

r,s = 0 

and 

= <F+*K 

Krs “ dtl dt’ lx=tt ,o 

If f(x,y) = g(x) • h(y ), the moments of x and y may be computed 
separately. In this case 

M(t x ,t y ) = Se T *g(x) dx • dy = ¥(4) * ¥(4), 

where 

T x = (a: — Mio )4j Ty — (y Moi 

and the two integrations are taken over the respective ranges of x and ?/. 
Hence, we have the following theorem: 

Theorem 5.1. The moment generating function about the origin of the 
sum of two independent variates is the product of the moment generating 
function of each . 

Proof: 

m(t) x +v — E[e t(x+y) ] — E(e tx )E(e ty ) = m(t x ) • m(t y ). 

The theorem is also true for the moment generating function about the 
mean of the sum of two independent variates. 

For the bivariate distribution 

K = log m{t x ) + log m{t y ) = K(t x ) + K{t v ). 

5.9. Extension to k Variates. Let 

h 

f\xi i n ^ 

i = l 
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represent the joint probability distribution of the k variates, Xi, x%, , 

xjc, where/{#»} is the frequency function of the k variates and 


k 

]][ dxi = dx i dx 2 • • • dx k . 

i — 1 


The following properties are those for the continuous case, but they can 
be applied equally well to discrete variates: 

k 

(a) (?(w) = J * * ’ J f {^4 n dx^ 

W i — 1 

where w is a subregion in the fc-dimensional space, W. 

k 


(b) 


gix i) = /••■/ /{*<} n 

w i — 2 


dx 


(c) f(xi\x 2 , • • • ,%k) 


fM _, 

f(x 2, • • • jXk) 


where 

f(x 2 , . . . ,Xk) = SfM dx 1 


over the range of x\. 

(d) For a transformation 

Ui — i = 1, 2, . . . , k 



dui 

dui 

dui 

1 

dxi 

dX2 

dXk 

J 

dUk 

dUk 

du k 


dxi 

6x2 

dXk 


k 

(e) E[e{xi }] =[■■■[ 1 n dx ‘- 

J w t=l 

k 

(/) E[e* l«l] = /■•'/ e^'Axi] 11 dx i- 

W * =1 . 

Example 5.14. The multinomial distribution is a generalization of the 

binomial distribution. Any one of the events yi } y2, . • • > 2/& can occur 

k 

with respective probabilities pi, P 2 , . • • , pk on a single trial, ^ Vi ~~ ^• 

i — 1 

If n trials are made, the probability that yt occurs xi times, y 2 occurs x 2 

k 

times, etc., ^ Xi = n, is given by 

1 = 1 
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fM = nr- II P?- 

n^ <=i 

»«i 

This is the general term of the expansion of 

(Pi + P2 + * * • + p*)*. 

For example, a single die can show the number 1 , 2 , . . . , 6 on the upper 
face with equal probabilities, p { = 

In this case the moment generating function about the origin is 


m{di} = E( e L ^ iXi ) 


Sites = V' w! Sfte 


4 


n 


n v? s 


where the symbol ^ is to be interpreted as meaning the summation 

X 

all values of x, such that 2#* = n, 


over 




Then 




n\ 


x l i x. 
i 


Mi* = 


M2* 


n 

i 

dm {(9i j 
< 90 ; 

__ <9 2 m{ 6i 


FI (Vie Bi ) Xi = (pie® l + p 2 e fl8 + p 3 e fl * + • * • + 


{»•■} =o 


= npi, 


and 

Also, 

Hence, 


{ft) =o 


npi -f n(n - 1 )pf, 


a ! ~ M 2 * — (mh) 2 = ~ np? = np,-(l — p^. 

= n(n - 1 )p i p j . 


, d 2 m\6, 

M*/ = 


ddi ddj 


{flf} =o 


a ij ~ M ij Mi*Mii — ft (ft ~ l)p;Py ~ n 2 piPj — 


■npiPj, 


EXERCISES f 

5.1. The marginal distribution of x for a bivariate distribution function 
f(x,y) is 

9 (a) = //fey) dy. 

t The following exercises contain important theory which will be referred to subse¬ 
quently and hence should be worked by all students: 5.1, 5.2, 5.4, 5.5, and 5.6. 


J 
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The limits for y must be determined by the region W within which 
fix y) is defined. If W includes the entire (x,y) plane, the limits are 
( _’co,oo). However, consider a problem of this nature: f(x,y) = Tcxy 
for x < y, 0 < y < 1; then 

g(x) = f* kxy dy, 0 < x < 1, 

and 

h(y) = Jj kxy dx, 0 < y < l. 

Find the explicit values of g{x) and h(y). t 

5.2. Given a continuous bivariate frequency function f(x,y) and the 
corresponding marginal distributions,^) and h(y). Set up the integrals 
for the following: (a) mean and variance for y for a given x, (y y \x,<r y \ x ); 
(i b ) mean and variance of a: for a given y. 

5.3. Given the bivariate frequency function f(x, y) - 2 /a , 0 < x < y, 
o < y < a. Show that (a) fff(x,y) dx dy = 1 , over the respective 

ranges of x and y; (6) g(x) — 2(a — x)/ct 2 , (c) h(y) — {2/a )y } 

(d) f(x\y) = \\ 


(e) n io = a/ 3 , Moi = 2 a/ 3 ; (/) p = h (?) /*vi» = ( a + x V 2 ? W - v/ 2 - 

5.4. In Exercise 5 . 2 (a) the mean value of y for a given x (mi/i*) 1S a 
function of x and hence defines a curve in the (x t y) plane called the curve 
of regression of y on x. If the regression of y on x is linear (y vlx - a + fix), 
then 

or 

j 00 yf(x,y) dy = ag(x) + @xg(x). 

By integrating each side of the last equation with respect to x, show that 

Mol “ a + 0Mio- 

Before integrating, as above, multiply both sides by x, and show that 

Mn — a ^io + &4o- 

Hence, find the values of a and & in terms of the moments of the original 
distribution, and show that 

Mwi* = Moi + p ( x ~~ ^io)* 

V X 
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5.5 In Exercise 5.3(a) if the variance of y for a given * is averaged 
over all values of x, we have 




dx 


= /_ ,, /_. „ (v — Mill *Yf{x,y) dy dx. 

If M»i* = a + fix, as given in Exercise 5.4, show that 

a l* = ff»(l - P 2 ). 

5 . 6 . Work Exercises 5.3(a), 5.4, and 5.5 for a normal bivariate dis- 
tnbution. 

5 . 7 . As an example of the distribution of a function of two variables 
or a discrete bivariate distribution, consider the distribution of the 

sum of two independent Poisson variates, x and y. The joint distribution 
oi x and y is 


f(x,y) = e-™* 


yi 


Let z — x -f- y } or x = z — y; then 


K*,V) = 

[Z - y)\y\ 


Sum out the variable y over the range 0 < y < Z) and show that 

f( Z ) = r (mi+mi) ( m l + m 2) Z 

zl 

Henee show that the sum of two independent Poisson variates is dis¬ 
tributed as a single Poisson variate with a mean equal to the sum of the 
two single means (m = mi -f m 2 ). 

5.8. Given f(x,y) = 6(1 - x - y) for (x,y) contained within the 
triangle bounded by x = 0, y = 0, x + y = 1. 

(а) Find the means and variances of x and y and the covariance of 
x and y, 

(б) Find the equation of the regression line of y on x and al x . 

5.9. Given f(x,y) = kxy{ 1 - x - y) over the same triangle as in 
Exercise 5.8. 

(a) Find the value of k which makes f(x,y) a frequency distribution. 

(o) Find the marginal distribution g(x). 

(c) Find y tix . 

5.10. If a: is a discrete variate having the Poisson distribution 


g(%) 


mre~ 

x\ 


y < x < oo } 


.1 
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and y is another discrete variate having the binomial distribution 
f(y\x) = C(x,y)p v q x ~ v , 0 < y < 00 , 
show that the marginal distribution of y is 

( mp) y e~ mp 

yi 

5 . 11 . Given that x and y are normally distributed with zero means 
and variances and <s\. Find the distribution of z = x + y. . 

5 . 12 . If x and y represent the number of dots appearing on dice A 
and B, respectively, what is the probability that in throwing the two 

dice x + 2y < 6? . 

5 . 13 . Given f(x\y) = y x e~ y /x\ and h(y) = e y , where x is discrete 

(x = 0, 1, . . .) and y continuous (y > 0). Show that g(x) = (i)* +1 - 

5 . 14 . ' Given two continuous variates x and y (x, y > 0) with the fre¬ 
quency distribution f{x,y) - 2(1 + x + 2/)~ 3 - Find g(u) and <?{u < 1), 
where u = x + y. 

5 . 15 . Given f(x,y) = 1 over the square (0 <x, y < 1). ©how that 
(?(xy > u) = 1 — u -V u log u. 

5 . 16 . Show that the joint moment generating function of xi and x 2 
from the bivariate normal distribution with means mi and /* 2 , variances 
v\ and cr|, and correlation p is given by 

— e ilMI+i2M2+ ^ (<lVl2 ' i ‘ 2p<;if2<T1<T2+<;2trz2 ' ) • 

5 . 17 . Use the results of Exercise 5.16 to find the variances and covari- 
ance of xi and x%. 





CHAPTER 6 


DERIVED SAMPLING DISTRIBUTIONS AND ORTHOGONAL 
LINEAR FUNCTIONS 

6.1. Introduction. For the sequence of the statistical method — 
specification, distribution, estimation, and tests of hypotheses —we have con¬ 
sidered certain parent populations and their properties which are usually 
specified in applied statistical investigations. After problems of specifica¬ 
tion it would seem logical to discuss problems of estimation next in order. 
Such problems involve determining what functions of the sample observa¬ 
tions should be used to “best” estimate the parameters of the specified 
population distribution, where “best” must be defined in some exact 
manner. For example, if Xi, X 2 , . . . , X n is a sample of n observations, 
specified as having been drawn in some manner from a normal distribution 
with mean, p, and variance, <r 2 , what two functions of the observations 
should be used to “best” estimate p and a 2 ? A function of the observa¬ 
tions used to estimate a population parameter is called an estimator. The 
numerical value obtained by using the estimator is called an estimate. It 
seemed appropriate to earlier workers in statistics to use the sample mean, 
X = 2X/n, and the sample variance, (V) 2 = 2(X - X) 2 /n, as estimators 
of the two populations parameters p and cr 2 . However, as will be shown 
later, in cases where the sample is a random sample, the “best” sample 
estimate of a 2 is s 2 = 2(X — X) 2 / (n — 1). A random sample is a sample 
drawn in such a manner that the probability of obtaining any member is 
independent of the probability of obtaining any other member. 

A less restrictive method of estimating the value of a parameter, 9, is a 
method which derives limits c\ and c<i which are functions of the sample 
values (Xi,X 2 , . . . ,X n ). The interval (ci,c 2 ) will contain the parameter 
9 a certain percentage of the time. The limits are thus functions of the 
sample and this percentage, which is called the confidence probability. 
It is understood that in each repeated sample a new set of confidence 
limits is determined. The concepts of confidence limits were introduced 
by Neyman. 1 

Various methods of estimating the confidence interval will be discussed 
in later chapters. Fisher 2 uses the term fiducial limits to indicate 
essentially the same concept. 

In the chronological development of modern statistical methodology, 
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however, the distribution of estimators and the distributions of certain 
functions of these estimators used in making tests of hypotheses and 
setting confidence limits did not wait upon the development of a sound 
theory of estimation. In this chapter, then, we shall assume, for the 
time being, that certain estimators are the “ best ” estimates of the cor¬ 
responding population parameters and that certain functions of these 
estimators used in making tests of hypotheses and setting confidence 

limits are the “best” such functions. . „ 

6.2. Derived Sampling Distribution Problems. In a typical applied 
statistical investigation, for example, a sample survey or an experiment, 
we specify that the observations obtained by some sampling process 
are drawn from some particular parent population distribution. We 
then calculate certain functions of the observations as estimates of the 
parameters of the specified population. Now, in order to set confidence 
limits for the population parameters or to make tests of hypotheses 
concerning the population parameters, it is necessary to know the 
probability distributions of the estimators, for example, of X and s 2 for a 
normal parent population, and of certain functions of the statistics, 
for example, of x 2 > t, and F. Mathematically these probability dis¬ 
tributions are derived from the specified parent population distributions 
and hence are called derived sampling distributions . 

6.3. Random and Systematic Samples. In applied statistics two 
kinds of samples are in common use, {a) the random sample as defined 
above, and ( b) the systematic sample. In the latter the first member 
may be chosen at random, but subsequent members depend upon the 
position or value of the preceding members (a systematic sample after 
a random start), or all members of the sample may be chosen systemati¬ 
cally, including the first member. 

Soil sampling—to study nutritive 3 and other components 4 of the soil 
in most cases makes use of some form of systematic sampling. Soil 
samples are selected from various spots in a field, and chemical determina¬ 
tions are made to determine what nutrients are required to bring the 
soil up to a suitable productive capacity. It would be possible to lay a 
grid down on the field and select the samples at random from the grid. 
This method, however, presents many practical difficulties such as t e 
exact determination of the selected sample point and the excessive time 
consumed in finding these points, especially if the field is irregular m 
both boundary and contour. It is much easier to select the first point 
at random by selecting two random numbers to determine the two coordi¬ 
nates of the starting point and then proceed a certain numbers of paces 
from this point to the next one by a predetermined route. . 

Another systematic method is to predetermine a definite route; and 
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collect samples only along this path. This latter method is used quite 

extensively in sampling forest stands 6 to estimate the amount of usable 
timber. 

Most economic data are far from random but have been analyzed in 
many instances in the past as though they were random because of lack 
of techniques for analyzing nonrandom data. More exact methods of 
analyzing this type of nonrandom data have been devised. The chief 
obstacle to the use of the more exact techniques is computational. The 
rapid development of electronic computers may overcome this difficulty. 

The derived sampling distributions obtained in the next chapter will 
assume a random sample of n observations unless otherwise stated 
Randomness in the sample must be ensured by the use of an objective 
method of selection. Tables of random numbers provide such a means 
and have been made available by Tippett, 6 Fisher and Yates, 7 Snedecor 8 
and others. ’ 

6.4. Linear Functions. In the derivations of sampling distributions 
m the next chapter it will frequently be necessary to know the distribution 
of some linear functions of the members of a sample. Let such a linear 
function be given by 

n 

l = aiXi + a 2 X 2 + • • • + a„X„ = ^ ai X it 

where a { is some fixed constant and the sample, here not necessarily 
random, is represented by X,, X*, ... ,X n or {X,}. In order to obtain 
some general results, in the discussion following, each X t is assumed to be 
drawn from a population with mean m and variance <rf. 

6.5. Properties of Linear Functions. It follows that 

n n 

e{d = ^ = 2 «<«• 

Also, ’" 1 i_1 

a! = E[l - E{l)Y = E [ | - w )] 2 = | ^ + 2 

i ~ l *=1 %<f 

If {X;} is a random sample, then = 0 and 


= 2 

i — 1 

but this will not be true for a systematic sample since ^ will not be zero 
m that case. 
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Finally, if all m = i u and all of = <r 2 , then 

n 

E{ 1 ) = M ^ di 


and 


at — a 1 


i= 1 


Example 6.1. If each on = l/n, then l = (1/n) £ V, which is the 
sample mean, X. Further, for any sample 


t = 1 


£(V = - (np) = m. 

Tv 


Also, if the sample be random, 

4 = i?(Z-M) 2 = n«r 2 -i = J 

In Sec. 6.1 mention was made of a “best” estimator of a population 
parameter. One property usually desired in a “best” estimator is that 
of being unbiased. An estimator of a population parameter is said to 
be unbiased if its expected value is equal to the population parameter. 
It follows from Example 6.1 that X is an unbiased estimator of u whether 
the sample is random or systematic. However, using an analogous 
method to that for a random sample for obtaining an estimate of the 
variance of a systematic sample does not lead to an unbiased estimate. 

n 

Example 6.2. Let Z = ^ a^Xi - p), for a random sample from a 
population with ^ of = 0-2 5 then 


E(l) = £ OiE(Xi - m) = 0 . 

t»l 

Also, 

af = E[l - E(l)Y = E [ ^ af(X { - m ) 2 + 2 ^ - p)(X 3 - “ p)\ 

'i = 1 i <3 
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n 

Example 6.3. To find E(S) where S = £ (X,- - X) 2 , under eondi- 


tions given in Example 6 . 2 , we find 


n n 

[(X,. - ,») - (X - M )] 2 = Y (X, - M ) 2 - n(X - M ) 2 . 

i = 1 i = 1 

n 

^ ~~ = n<T ^ 


- ,)■ - £ 

by Example 6.1. Hence, 

E(S) = nc r 2 — a 2 — (n — 1)<j 2 . 

Now, if we set s 2 — S/(n — 1), then E(s 2 ) = <j 2 and therefore s 2 is an 
unbiased estimate of a 2 for a random sample. 

For a discussion of the above problems for systematic samples, consult 
W. G. Madow and Lillian Madow, 9 Lillian Madow, 10 and Cochran. 11 
6.6. Orthogonal Linear Forms. Consider the two linear forms 

' n n to 

h = £ aiXi an d h = ^ biXi. Since E(h) = Y and E(h) = 

*= 1 i =1 »=1 

n n 

^ him, then X(L ± Z 2 ) = V (a* ± bi)^. 

i =1 4=1 

Now, if {AC} is a random sample with m af = <r 2 , then 




S 4 = 


= <r2 X ^ anc ^ 


Hence the condition that would make h and Z 2 uncorrelated is that 

n 

^ Oibi = 0 . Two uncorrelated linear forms are said to be orthogonal. 

i = 1 

Example 6.4. For a random sample of 5 drawn from a normal popula¬ 
tion with mean, /x, and variance, a 2 , the mean value and variance of 


h — Xi + X 2 + Xz fi- X 4 -j- X 5 = 5X 


are, respectively, 


E(h) = M + ^ + M + ju + i u =5 ju, 

*11 * 5<r 2 . 
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For a second linear form 

I2 — 2Xi — X2 ~h X4 -f- 2Xr,, 

we obtain 

X(?2) == 2 (J, JU -j- fl -j- 2 (JL = 0 

and also 

= [(!)(“" 2 ) + ( 1 )(- 1 ) + ( 1 )( 1 ) + ( 1 ) ( 2 )] = 0 . 

For a third linear form 

— 2 Xi X2 2X3 — X4 “j“ 2 X 5 , 

we obtain 

E(h) = 2{jl — ix — 2/z — ix + 2(i = 0, 

also 

<r w , = <r 2 [(l)(2) + (1)( —1) + (1) (•— 2) + (1)(-1) + (1) (2)] = 0, 
and 

*m. = cr2 [( 2) (2) + (-1)(-1) + (0) (— 2) + (1)(-1) + (2) (2)] = 0. 

Notice that the sum of the coefficients of I2 and 1 3, respectively, is zero 
and that I2 and Z3 are uncorrelated with illustrating the theory in the 
paragraph above. 

Example 6 . 5 . To determine a fourth orthogonal linear form, 

h = biXi -j- 62X2 + 63X3 + b 4X4 -j- 65X5, 

to h, h, and h of Example 6 . 4 , we find 

hh'> bi + 62 + b% -f- 64 -f- 65 = 0, 

^2^4• 26 i T ^4 T 265 = 0, 

Z3Z4: 26 i 62 263 — &4 -f 265 — 0. 

Eliminating 63 from the first and third equations and then b 2 from the 
remaining two equations, we find 

bi + £>4 + 365 = 0. 

This condition will be satisfied if we let 65 = 1, 64 = — 2 , and 61 = — 1. 
Then, we find 62 = 2 and 63 = 0 . Hence a fourth orthogonal form 
desired is 

h = -Xi + 2X2 - 2X4 + X 5 . 

There are an infinite number of such functions. 

6 . 7 . Linear Forms with Normally Distributed Variates. Suppose 

n 

that the X< in the linear form l = ^ a*Xi each follows a normal parent 

i== 1 
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population distribution with mean, and variance, of. Then, the 
probability of getting the particular X/s in a random sample of n will be 

n 

1 1 .-g-i[(Xi->«>vw] n dx,. 

(2tt)“ /2 » 


It is desired to find the distribution of [l — E(l )]. 

The moment generating function of [l — E(l)] is 
M{t) = 


1 1 

(2ir) n/2 n 

n 


J -« , = i 


nrV;/- 


1 V <?i \/2 tt J ~ ™ 

= 1 

By completing the square of the exponent we have 
- j- [(X* - M,-) 2 - 2 ffitOi{Xi - ih) + crjCa*] + 


,-(X i - W )V2<ri2+to i (X i -Mi) ^X‘ 


Hence, 


M(t) = J] e° 


1 (? 
^ [(X, - *) - o-fte] 2 + 


(< 2 / 2 ) 


Now, the moment generating function for the normal distribution 


e -yyw dy } — cc < y < co ? 


Mft) = —== [ " <r*' s/2 "^*' dy = 

(T V 2tT ' 00 

Hence, we may say that l is normally distributed with mean E(l) = ^ ayii 

i = 1 
n 

and variance of = ^ afof. This result is analogous to that found in 

i-i 

Sec. 6.6 for the mean and variance for nonspecified parent population 
distributions. 
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EXERCISES 

6.1. Determine a fifth linear form orthogonal to the form in Examples 
6.4 and 6 . 5 . How many linear forms make a complete set for a sample 
of n? 

6.2. Given a random sample of N, all members being drawn from the 
same population. Determine the relationship between the cumulants 
for the total of the sample and the cumulants for a single member of the 
sample. Hint: 

Wsx(0 = [wi(t)] N . 

6.3. Use the results of Exercise 6.2 to determine the first three cumu¬ 
lants for the total of N from a binomial distribution. (N represents the 
number of samples and n the number of independent trials per sample.) 

6.4. What happens to Exercise 6.2 if each member of the random 
sample is drawn from different populations? Hint: Show that 

N 

KMt) = ^ KS) 

i — 1 

and the rth cumulant for the sum is hence 


N 



i = 1 


Consider this problem for Poisson distributions with unlike means and 
binomial distributions with unlike values of n and p. 

6.5. Given two independent estimates of Y, X i and Xu, with variances 
<r{ and a\, respectively. If we desire to estimate Y as an unbiased linear 
function of the A"’s, find the coefficients of the X J s which will minimize 
the variance of the estimate. Hint :E(Xi) — Y. Let Y = liX i + UX 2 . 

6 . 6 . If the Xi in the linear form l = SaAL are normally and independ¬ 
ently distributed with means m and variances of and all a* = l/n, 
M . = JUj and <n = <r, then l = X. Show that the mean of a normal sample 
is normally distributed with mean ju and variance a 2 /n. 

6 . 7 . Suppose that Xi represents the mean of a sample of Wi taken from 
a normal population with mean in and variance of and X 2 the mean of a 
sample of n 2 taken from another normal population with mean ^ 2 and 
variance o-|. Let l = Xi - X 2 . Show that l is normally distributed 
with mean m ^ and variance 


£i f|. 

ni r n% 
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6 . 8 . Consider the following applied problem: The effect of a nitrogen 
top-dressing on a crop is to be determined. In addition it is desired to 
discover at what time during the growing season the top-dressing, if 
beneficial, should be applied. The experiment was designed as follows: 
Use four plots, one with no nitrogen, one with nitrogen applied early, 
one applied in the middle of the growing season, and one applied later! 
Naturally, this entire experiment would be repeated several times, say 
r, m order to discover how consistent the differences were from replication 
to replication. The totals of the 4 r plots are indicated as follows: 



Nitrogen applied 

No N 

Early Middle Late 

T 0 

Tx T % T 3 


r 

!To = Xoi + X 02 + • • • + X 0r , Tx = ^ Xxj, etc. 

3 = 1 

Suppose X iS is where p, is the mean effect of the ith 

treatment. Three pertinent independent linear forms are: l, = 3T 0 + 
Ti+ T 2 + T z , for determining the effect of nitrogen; l 2 = T s - T, for 
determining the linear effect; l 3 = T, - 2 T, + T a , for determining’the 
quadratic effect. Find a fourth linear form to complete the set and 
determine the variance of each. 
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CHAPTER 7 


DERIVED SAMPLING DISTRIBUTIONS: NORMAL PARENT 

POPULATION 

7.1. Introduction. In this chapter we shall consider only random 
samples of n drawn from a normal parent population with mean g and 
variance <r 2 , that is, from iV(g,<r 2 ). If it is desired to indicate that two or 
more variates are also independently normally distributed with mean p 
and variance o- 2 , we shall denote this by 

NID( M; o- 2 ). 

The derived sampling distributions which will be discussed here are of 
importance in applied statistics in making tests of hypotheses and setting 
confidence limits. As pointed out earlier, we shall assume in these dis¬ 
cussions of distribution theory that certain estimators of population 
parameters and certain test criteria are the “best,” and we shall be 
interested here in obtaining their probability distributions. In later 
chapters on tests of hypotheses and estimation, a discussion will be given 
of what properties a “best” estimator or an “appropriate” test criterion 
should have. 

Methods of obtaining test criteria and setting confidence limits will 
now be explained in order that an understanding of the uses of these 
important statistical methods may be obtained simultaneously with the 
derivations of the related sampling distributions. The methods described 
below do not take into account important considerations described in 
detail in subsequent chapters on estimation and tests of hypotheses, but 
for the cases presented the corresponding techniques will be the same. 

If it be possible to find the sampling distribution of a function of an 
estimator and its corresponding population parameter which is inde¬ 
pendent of the parameter and all other unknown parameters, then hypo¬ 
theses specifying numerical values of these parameters may be tested. 
Such a hypothesis is often called a null hypothesis, and the function called 
a test criterion. 

By making probability statements regarding the test criterion in the 
form of inequalities, and then solving these inequalities for the population 
parameter, one may set confidence limits for the particular parameter. 

Illustrations for the techniques described in the above two paragraphs 
will be presented in subsequent sections of this chapter. 
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If the specified parent population distribution is not normal, then the 
derived sampling distributions may become unduly complicated. Non¬ 
randomness in the sample presents additional difficulties. Consequences 
of the relaxation of such assumptions regarding the parent population and 
the sample have been studied to some extent, and further investigations 
may be expected in the future. 

7.2. Distribution of the Sample Mean, X. In Sec. 6.7 it was shoyn, 
for a random sample of size n, with each X* from a parent population 

n n n 

Niwi) , that the distribution of l = Y a { X { is N ( ^ a tin, ^ 

n n 

If a . = m = fx, and or? = <r 2 , then V a { X { = X, ^ ana = M, and 
n i =1 i =1 

-• Therefore the distribution of X is N(m, <r 2 /n). 
n 

T « 

<x/\/n 

The sampling distribution of T is AT (0,1), which is symmetrical about 
zero. Then, if a be known, T fulfills the requirements set out in Sec. 7.1 
for a test criterion which may be used to test a hypothesis specifying a 
numerical value for y , say m “ Mo. 

Suppose a random sample of size w is drawn from the assumed popula¬ 
tion, N(jx 0 ,<r 2 ). The sample mean, X, and a corresponding value of T f 
called To, are calculated, where 

m . % - MO. 

1 0 - —, /- 
a/'S/n 

It is possible, using Table 15 in the Appendix, to find the probability of T 
being greater in absolute value than the calculated T 0 . Should this prob¬ 
ability be as small as or smaller than some preassigned small numerical 
value, a, we say that we have evidence that the hypothesis being tested 
is not true, or we should be forced to accept the occurrence of a very 
unlikely event. This a is called the significance probability, and the 
entire testing procedure a test of significance or a test of the hypothesis that 
^ = jn 0 . A more precise mathematical treatment of tests of hypotheses 
will be presented in Chap. 11. 

The validity of the procedure set out in the above paragraph may be 
checked empirically by constructing a simulated population, N(p 0 ,<r 2 ), 


n 



i = 1 


Let 
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drawing a large number of samples of size n, and finding the empirical 
distribution of the values of To obtained from these samples of size?!. It 
would be found that approximately a of the calculated T 0 values would 
lie outside of the corresponding tabular T values (usually denoted as 
-T a and T a ). 

The probability, 1 - a, that T lies between - T a and T a may be 
expressed as (?{-T a < T < T a ] = 1 — <*, or - 4 = f Ta dy = 1 - a. 

V / 2t J-Ta 

It follows that 




(P [T < T a ] = 

or 


(P 

i-1 

< 1 

53 1 *S 

A 

i_i 

Solving for g, 

we find 



(P 

fX - 7> ^ - 
Vn <f, _ 

Similarly, 


(P 

/ X + TV1 

g <-=— 



Vn 

Hence, 


x T a (7 X + T a cr 

--^=— < H < - jrr- 

Vn v 71 


(1) 


The symbol (P[T < T a ] is read “the probability of T being less than TV’ 

The inequality set out in (1) above provides a confidence interval for g 
at the (1 - a) confidence probability level The validity of the procedure 
described may be checked in a similar manner as that suggested for the 
test criterion in preceding paragraphs. It should be noted that g is a 
population parameter and does not have a sampling distribution but is 
some fixed though unknown quantity. For some sample in hand, (1) 
above is not an a priori probability statement but indicates an average 
outcome to be expected in repeated sampling. This kind of probability 
is called a confidence probability. A more precise mathematical formu¬ 
lation of these concepts will be presented in Chap. 10. 

Values of (1 — a/2) are given in Table 16 in the Appendix for y ranging 
from 0 in steps of .01 to 3.49. The reader must interpolate in order to 
determine y = T^ for a specified value of a (see the section on Explanation 
of the Tables preceding the tables). Values of T a are given at the bottom 
of Table lb in the Appendix for a few selected values of 


.i 
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Example 7.1. Given a normal population of yields with variance 
o- 2 = 90. The following nine sample values were obtained from this 
population: 65, 45, 43, 40, 64, 58, 52, 56, 63. The mean of this sample is 
X = 54. On the basis of previous experience it was believed that the 
true mean, jx, was 50. Using a significance probability of a = .05, does 
the sample contradict this presumption? We find that 


T 0 ,= 


54 - 50 

V'io 


1.26. 


From Table 16, we see that T 05 = 1.96. Since T 0 is less than T . 05 , we 
cannot contradict the presumption that ix = 50 on the basis of this 
sample. 

On the basis of this sample, in the absence of any knowledge of jx, we 
could set up confidence limits for n- The 95 per cent confidence limits 
are 

54 - (1.96)(3.16) < n < 54 + (1.96)(3.16), 
or 

47.8 < fi < 60.2. 

This states that from our sample of nine values, we infer that the true 
mean lies between 47.8 and 60.2 with a confidence probability of 95 per 
cent. 

7.3. Law of Large Numbers. In the preceding section it was shown, 
for a random sample of n from an arbitrary parent population, that 
E[X] = ix and cr| — a 2 /n. It follows that whatever the form of the 
parent population distribution (provided the variance is finite), the dis¬ 
tribution of the sample mean becomes more and more concentrated about 
the population mean as the size of the sample increases. It is evident 
then that as the size of the sample is increased, the more confident we can 
be that the sample mean provides a “good” estimate of the population 
mean. Essentially this is the meaning of the law of large numbers . A 
more precise statement of this property is provided by Tchebysheff’s 
inequality , which will be derived in Chap. 8. 

7.4. The Central-limit Theorem. This theorem states that: 

If an arbitrary population distribution has a mean /x and finite variance a 2 , 
then the distribution of the sample mean approaches the normal distribution 
with mean u and variance a 2 /n as the sample size n increases. 

The proof of this important theorem is beyond the scope of this text 
except for a distribution possessing a moment generating function. Now, 
if Y is distributed as N(p,i x 2 ) and u = (F — /x)/cr, then 
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m u (t) = ' '] 

= e ~ tfi/<T E[e it/a)r ] 

— . QfJ-(t/<r)+it 2 

= eM\ 

If X has an arbitrary distribution, and v = (X — aO/V, then 


m B (0 = X[e 


zrX^> 


e ff #|> a ], 


Also, the moment generating function of w = (X — n)/(<r/\/n) is 


t (*ZL.\ 


m w (t) = i£[e Wn/w'] = e a 1 
= {e~ v(t/*V^) E[ e (tx/*VZ) ] j 


tVn _J_ (Xl +X 2 + - + X„) 

M - E[e^ } 


It follows that 


■hA)r 

i that 

log m w (t) = n log j ~m v . 


K w 0) = nlog 1 + - ( 2 + 


log (1 + X) = X - -|X 2 + fX 3 


1 AO 

3! Vn a ' 


we have 


K w (t) = n 


rt (t +- 

\n \2 T 3! 


I~=.^ s t 3 + ■ 

\/ n c 

1 1 /1 2 


2 «H2 3! s/n 


lim K»(«) = ~ 

n—> oo ^ 

Hence, K w (£) approaches K„(£), and it follows that the distribution of X 
approaches the normal distribution with mean n and variance <r 2 /n as the 
sample size n increases. 

7.5. Chi-square Distribution. An important distribution that enters 
into the theory of derived sampling distributions is the chi-square dis¬ 
tribution, defined as 


d(x*) = 2 ^ 72 ) (xV’ /2, - 1 e-‘ V2 d{x\ 0 < x 2 < «, 
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where v is called the degrees of freedom. It is easy to see that 

CP (x 2 < x 2 ) = JU'Mx*) d(x>) = f;° y^e-y dy 

_ r, a 0/2) 

T(v/2) 

which is the incomplete Gamma function, I y<x (v/2) } with y a = x 2 /2. This 
function has been tabulated in the form I(u^pj by Karl Pearson* for 
various values of u = yjs/p + 1 = xl/2 VpTT and p = (y/2) - 1. 
In this form, (P ( x 2 > xj) = 1 ~ I(u, p). These tables are described and 
their uses in statistics explained in detail by Bancroft. 2 

Catherine M. Thompson 3 gives values of x 2 for a = .995, .990 975 
.950, .900, .750, .500, .250, .100, .050, .025, .010, and .005 and v = 1 (1) 30, ? 
where a = (P (% 2 > x^)* These values are presented in Table II in the 
Appendix. 

7.6. Properties of the Chi-square Distribution. The moment generat¬ 
ing function for x 2 is 

1 f 00 /vA Cv/2) - 1 / 2\ 

m(t) = rm Jo G ) «- (I - a)teV2) d (§) = a - 2r- 

It follows that the cumulant generating function is 


K(<) = - ~ log (1 - 2 1) 


OO 

-a 

1, = 1 


(2 ty 

2 ^ “ 2 ^ "T 

whence 

k, = (i - 1 )!2 i - 1 !/. 

The x 2 distribution has mean v and variance 2v. 

Consider a new variable y = (x 2 - v)/x/2v, for which p. = 0, a- 2 = 1 
The cumulant generating function for this new variable is 



K ® = i 4 

and 


Then 

« = 0)*’ 2 a - 1)!, 

Also, for i > 2 

2 a/2 

&Z ^ ) Ki — 

V v 


12 

v ) 


> 2 . 


etc. 


lim Ki — 0. 
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But this is a property of the normal distribution, indicating that the x 2 
distribution approaches the normal distribution for large v. 

7.7. Distribution of a Sum of Squares of Deviations. Given a sample 
{X;} of n from the population of X’s which are NID(^,cr 2 ). Set 

u 2 = (X, - nY/cK 
We wish to find the distribution of 


T (v - m) 2 



Now, since the X’s are independent, 

m U i(t) = mi(t) • m 2 (t) • • - m n (f), 

where Mi(t) = m u ^(f) • Hence, since all the X’s have the same distribution 
function, 

m U 2{t) = [rrii(t)] n . 

Now 

mdi) = -4= f ” e'“‘ ! e-“' V2 cte = (1 - 2t)~k 
V2tt V « 


Hence, 


m U i(t) = (1 ~ 22) r 


But this is the moment generating function of a x 2 with v — n degrees of 
freedom; hence 

n 

2 (V - m ) 2 

t = 1 

( 7 2 


is distributed as x 2 with v — n degrees of freedom. In this case each X 
furnishes a single degree of freedom; hence, the number of degrees of free¬ 
dom is the number of independent values which make up x 2 - The num¬ 
ber of degrees of freedom in general is the total number of observations 
less the number of independent restraints imposed on the observations in 
forming the distribution. 

This x 2 distribution can be used to test hypotheses and to set up con¬ 
fidence intervals concerning the population variance, a 2 , assuming the 
population mean, n, is known. A sample value of x 2 is computed from a 
sample of n as follows: 

n 

y (v-m ) 2 
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A significance probability, say a, is decided upon, and x 2 is then compared 
with the corresponding tabular value, xl, where 

(P(x 2 > X«) = CL. 

If Xo ls greater than xl, we conclude that cr 2 is greater than the assumed cr^. 
It should be noted here that we can only test the hypothesis <x 2 = c 2 
against the alternative that cr 2 > a 2 . If it is not known that <r 2 is at least 
as large as o-g, a more general testing procedure is needed; this will be dis¬ 
cussed in later chapters. 

The construction of confidence interval for <x 2 is complicated by the fact 
that the x 2 distribution is not symmetrical Hence it is not clear that an 
equal amount of probability should be put outside each confidence limit. 
This problem is discussed in detail in Chap. 10. If the same amount of 
probability (a/ 2 ) is put outside each confidence limit, the confidence limits 
for <r 2 are 

^ 

xl xl 

n 

where n(s') 2 = £ (X s - - M ) 2 . (P( x 2 > x |) = a /2, and <P ( X 2 < xl) = a/2. 

i = 1 

Example 7.2. Suppose in Example 7.1 that the true mean of the 
population was ju = 50 but that a 2 was unknown. In this case 

9(s ') 2 = 2 (X, - m ) 2 = 868 . 

If we want to test the hypothesis that a 2 = 90, we use the test criterion 

Xo = W = 9.64, 

to be compared with xl for 9 degrees of freedom. If we use a = .05, 
X 2 ~ 16.9. Hence we cannot conclude that cr 2 is greater than 90 on the 
basis of this sample. We see that the 95 per cent confidence limits are 

868 . 868 

190 < o - 2 < or 46 < o - 2 < 321. 

7.8. Reproductive Property of the Chi-square Distribution. Using 
the relationship 

n 

2 (v - m) 2 

9 _ i=l 


we may prove that the sum of n variates, each independently distributed 
as x 2 with Vi degrees of freedom, is itself distributed as x 2 with v = 
degrees of freedom. Suppose 
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x? = — 


VI 

y (Xu - mi)- 


V2 

l {X 2t — M 2 )' 


Then % 2 — x\ + xl has the moment generating function 

= [mxAt)][mxAt)] = (1 ~ - 2 1)-«* = (1 - 

But this is the moment generating function for a chi-square with (zn + v 2 ) 
degrees of freedom; hence, xl + xl is distributed similarly. This result 
may be extended to n variates. 

7.9. Simultaneous Distribution of the Sample Mean X and Variance 
Estimate, s 2 . It was shown in Sec. 7.7 that 


n 

l (V - m) : 


is distributed as chi-square with v = n degrees of freedom. W e shall now 
obtain the simultaneous distribution of two parts of this sum when 
expressed as 

V (Xi - »y V (Xi - xy n(X - m ) 2 _ O - iK , n (x - m ) 2 

4 ^- 1 + + * 2 

i = 1 i = 1 

where it will be recalled that X is an unbiased estimate of m and s 2 is an 
unbiased estimate of a 2 . 

In order to simplify the notation in the following derivation, let 
m - {Xi — p)/<r, so that Ui is X( 0 , 1 ) and u = (X — m)A- Let 7 2 = s 2 /a 2 . 

The derivation will be accomplished by making use of orthogonal linear 
forms as discussed in Sec. 6 . 6 . To this end we set up the following orthog¬ 
onal linear transformation: 

Ul * * * H~ Un 

y i =- 7 =-> 

Ui — u 2 

2/2 ~~ \/2 

U\ -j~ U 2 — 2ti3 

2/3 = ’ 

Ui “b ^2 ~j- * * H~ u n — 1 {n XjUn 

^ n -\/n{n — 1 ) 
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It is easily seen that each is orthogonal to each y 5 (i ^ j). Also the 
denominators have been chosen to make the variances of the y 1 s equal 
unity. This type of transformation is called completely orthogonal , and 
these requirements may be designated as 




0 

1 


i ^ j. 
i = j. 


From Sec. 6.7 we know that the yjs are NID(0,1); hence, the simul¬ 
taneous distribution of the {yi} is identical with that of the {Ui} } and 


n 



n 



Also, since y\ = nu 2 , then 


n 



(n — l)y 2 . 


Since yi is independent of the other y 1 s, u is independent of y 2 . Hence 
f(u y y 2 ) du d{ 7 2 ) = /i(tZ)/ 2 (7 2 ) du d{y 2 ). 


We know that 


fi(u) du 


It is also known that 


4 


2tt 


e~ nuy2 du. 


n 



is distributed as chi-square with (n — 1) degrees of freedom since there 
are (n — 1) independent normal variates. Now, x 2 = (« — 1)7 2 ; hence 


/ 2 ( 7 2 ) d( t 2 ) 



d( 7 2 ). 


Using the original X’s, which were N(n,a 2 ), the simultaneous distribution 
of X and s 2 is j 


f(X,s 2 ) dX d(s 2 ) 



71 — 3 

| 2 e 


7l(X-At) 2 +0l-l)s g 

dXd(s 2 ). 


7.10. Distribution of t. It is now proposed to obtain the distribution 
of a test criterion, independent of a 2 , to be used in testing the hypothesis 
that a sample {Xi} was drawn from a normal population with a specified 
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mean /i. The distribution must be independent of a 2 , if it is to be of 
practical use, since we seldom know the value of <r 2 . 

One proposed test criterion of this hypothesis is 

, Z - M 

i — - —) 

s/\/n 

where X and s are computed from a random sample of n from the parent 
population NGu,<t 2 ). Two properties of t should be noted: (i) t is the 
ratio between a normal deviate and the square root of an unbiased esti¬ 
mate of_its variance, (ii) t 2 is the ratio between the square of a variate, 
y = V n (X — which is X(0,1), and a variate s 2 /a 2 , which is dis¬ 
tributed as x 2 /v with v degrees of freedom. Note that o- 2 cancels in the 
derivation of t. We shall take (ii) as the definition of t. 

From Sec. 7.7 


f(X,s 2 ) dX d(s 2 ) = k 


where 


2 \("- 3>/2 

-A 6 r-E»<*-/0»+(«-i).w dX d(s 2 ), 

— 00 <x< CO , 0 < S 2 < 00, 

nr (n - l\ (n ~ 1)/2 

h _^2r\ 2 ) 


Writing f(X,s 2 ) as a function of t and s 2 , we obtain, 

n _ 2 (n-l)s 2 (l+^) 

f(t,s 2 ) d(s 2 ) dt = k'(s 2 ) 2 e 2<j2 d(s 2 ) dt. 


(■n — 1 )s 2 i 1 H 


dt = k" 1 + 


u (n/2)-l e -u du 


00 < t < 00 . 


Since the area under the curve must be unity for a probability distribu¬ 
tion and because of symmetry, we have 


n — 1 
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Let 

then 

Let 

then 

Hence 

and 


t 2 — ( n _ l)y; 

k"T ^ (n - 1)» j“ (1 + v)~^tri dv = 1. 

V 

y ~ 1 + v’ 

I r i I_1 ^rl-1 

(n - l) 2 j 0 y 2 (1 - V) 2 dy = 


fc'T l ” 


1 . 


rr(5)(»-D*B(i, V 1 )- 1 ' 


fc" = 


1 


n — 1 ) r ^— 2 ~ ^ 

The distribution becomes 
/«-i (0 dt = 


\/x(n 


rO/2) / 1 , t* V 

: WfVy + »-t> 


-n/2 


dt, 


CO < t < CO . 


Since s 2 has (w — 1) degrees of freedom, f n —i(f) dt is designated as the 
distribution oft. for (n - 1) degrees of freedom. The distribution for n 
degrees of freedom is 


hit) dt 


("-T J 


0 + 0 


-0+l)/2 


— CO < t < 00 . 


\Zirn T(n/2) 

It should be noted that 

<P[\t\ > f«] = 2 j” hit) dt = a. 

If it is assumed that fX;j is a random sample of n from a normal 
population, we can test the hypothesis that the true mean of this popu 
lation is mo by use of t at, say, the a significance probability level. We 
compute 


to = \/n (X — mo)/s, 
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where X = 2X,/n and 8 = Vs(X, - X) 2 /(n - 1). Then a tabular 
value of i a is obtained for (n — 1) degrees of freedom, and \t 0 \ is compared 
with t a . If ] to | is greater than t a , there is evidence that the true mean, /x, 
is different from /xo. If |£o| is less than t a , there is no evidence that ii 
is different from ji 0 . 

Similarly if X is used as an estimate of g and 


S 2 = 2 (V - X)7(n - 1) 

t = 1 

as an estimate of a 2 , the two-tailed confidence limits for /x are 

X — t a — < fJ, < X + t a =r> 

Vn V n 

at the (1 — a) confidence probability level. 

Values of t a are given for various values of a and various degrees of 
freedom m Table III in the Appendix. For the location of other tables 
and methods of obtaining exact values, see Bancroft. 2 Note that the 
tabular values are for a two-tailed test. If it is desired to use (9{t > t 0 ), 
the tabular probabilities must be divided by 2. 

7.11. Test of the Difference between Two Means. Another important 
use of the t distribution is in testing a hypothesis concerning the differ¬ 
ence of two population means. Suppose that we have two random 
samples {X^} and {X 2a *} from populations N(/xi,< 7 ?) and NQi 2 ,<t|), respec¬ 
tively, with <rf = <j\ = a 2 . We wish to test the hypothesis that m = ju 2 . 
The procedure in constructing a suitable test criterion will be to (i) find 
a function of the observations whose distribution is independent of <r 2 
and which involves the difference between the population means Ml and 
and (n) find the sampling distribution of the function on the assump¬ 
tion that the hypothesis tested is true, for example, in this case, Ml = m 2 . 
Let the samples have the following characteristics: 


Sample 

Size 

Mean 

Variance 

{Xu} 

ni 

Xi 

s 2 

{Xu} 

n 2 

1 2 

S 2 


Now, (m - 1 is distributed as chi-square with (m - 1) degrees of 
freedom, and (n 2 — l)s\/<r\ is distributed as chi-square with (n 2 — 1) 
degrees of freedom. Since <r\ = <j\ = <j 2 , 

(ni - 1)*? (n 2 - l)s 2 [( Wl - l)s 2 + (n 2 - l)s 2 i 
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By the reproductive property of chi-square, assuming the two samples 
are independently drawn, this quantity is distributed as chi-square with 
(m + n% — 2) degrees of freedom. But 

2 — ( ni ~ 1) S 1 ~ 4~ Qft-2 ~ f) g 2 

s n i + ^2 ~ 2 

is an unbiased estimate of_<r 2 since its expected value is <r 2 . Also the 
variance of (Xi — mi) ~~ (^2 ~~ M 2 ) 1S 



The quantity 

;.V, - Ml ) - (X 8 - M 2 ) = (Xx - X 2 ) - (mi - M j) 

s \ f TTni) + (1/ns) S V (V ra 0 + ( 1 /^ 2 ) 
is the ratio of a normal deviate to the square root of an unbiased estimate 
of its variance. Writing the square of the quantity as 

[(Xx - X,) - (mi - m 2 )P/V (1 + 1) 

— s 2 /V 2 

we see that it is now the ratio of the square of a variate which is A(0,1) 
and a variate which is distributed as x 2 /(«i + » 2 - 2) with («x + »2 _ ) 

degrees of freedom. But the second property is the definition of a 
quantity distributed as t. Hence, the test criterion on the assump- 
tion that the hypothesis tested (mi = M 2 ) is true is 

Xi — X 2 _ _ 

s \/ (l/wi) + (V n 2 ) 

The two-tailed confidence limits for the true difference between the 
two means, mi - Mj = «, a re given by d - t a s d < S <d + where 

d = Xx - X 2 and s d = s V (1/wi) + (l/wa)- 

If <s\ 7* erf, then the variance of (Xx — mi) — (X 2 — M 2 ) is 


for which 


is an unbiased estimate. 

tf 






n 2 


o2 o2 

£l_ j "2 

n 1 ' n 2 

Hence the first property of t is met by 

(Xi - X 2 ) - (mi ~ m 2 ) 

\/( s i/ n i) + ( s l/ n2 ) 
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but not the second, since the square of the denominator is not distributed 
as x 2 / v, with v degrees of freedom. 

We know that (n\ — l)sf/oi is distributed as chi-square with (rii — 1) 
degrees of freedom and similarly for (n 2 — 1 )s|/cr|. Hence a denominator 
analogous to that of t 2 is 

(?ll — l)sf/crj -f (n 2 — l)g|/cr| 

ni + n 2 — 2 

and an analogous numerator is 

[(Xi — X 2 ) — ( fll — M2 )]2 

C a i/ n i ) + (o^/n 2 ) 

If we set 

t = ~ ~ (Mi ~ fi 2 ) / j jni - l)gj/<rf + (n 2 - l)i|75| 

V (o'l/'^l) + (cr^/n.2) • V ^1 + ^2 — 2 

and let X — cr|/crf and l = sf/sf, then 

y = * - 1) + (n 2 - 1 )l/\ \/(l/ni) + (X/n 2 ) 

Vni + n 2 - 2 V(l/ni) + (Z/nJ 

is an analogous expression to the ordinary t. The exact distribution of 
t' is not known and even if known would be of little practical use since 
it would undoubtedly involve the population parameters <j\ and a 2 . Two 
methods have been devised for using t' to make the desired test. They 
are described by Bartlett (4 - 6) and Welch. 6 

Several other methods have been suggested for testing the difference 
between two means when the population variances are unequal. One of 
the oldest methods is the Fisher-Behrens test, 7 which is based on Fisher’s 
concept of fiducial probability . Sukhatme 8 has prepared tables to be 
used in connection with this test. Since there is considerable controversy 
regarding the validity of this test, it will not be presented here. 

Two approximate tests have become quite popular. One by Cochran 
and Cox^ utilizes a weighted mean of the tabular t values for the two sam¬ 
ples, weighted by the two sample variances. Compute d = X x - X 2 
and 


The approximate tabular value for t' = d/s d is 

, _ (wda i + Wztad 
(Wi + w a ) ’ 

where Wi — s 2 /rii and t ai is t a for (n* — 1) degrees of freedom. 
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Another approximate test was suggested by Smith 10 and further 
expanded by Satterthwaite. 11 The test criterion is also t', but the approxi¬ 
mate tabular value is f« with / degrees of freedom, where 

§ 4 

f ~ (sj/wi) 2 , (s|/w a ) 8 
»1 — 1 n 2 — 1 


Dixon and Massey 12 have a slightly different value of /: 


,_4_o 

^ (sj/raQ 2 (sl/n 2 y 

Til + 1 ^ 2+1 

A method will be presented in the next section to test whether a f = < 4, 

using si and s|. . . 

7 . 12 . Distribution of F. Another very useful statistic m making tests 
of hypotheses involves the ratio of two chi-squares. Let x? and xl be 
independently distributed with n, and n 2 degrees of freedom, respectively. 
Then their simultaneous or joint distribution is 

/(xLxl) d(xl) d(xl) 


4r(m/2)r 


/ 2 \ (wi/2)- 1 / 2 \ (na/2) —1 

L___ I *!) (§ ) e --iww)n d(x i) d(xl)- 

])Y(n 2 /2)\2j V2 / 


Let 


Then 


„ n 2 x i 2 

F = or xl = --“* 


n ixl 


/(F, X2 ) - 2 r(n 1 /2)r(n 2 /2) * \2/ 


n 2 


l +ri2 — 2 


X 2 2 


( 1+ ^) 


and 


f(F) dF = [ / o ” /(f,X I) d(x!)] dP 
= CP (ni/2) - 1 ^1 + ^ P^ 


(> 1 + 712 )/2 


dF, 0 < F < oo. 


Since m and n 2 are degrees of freedom, 


X? = 


nis? 


and 


x 2 = 




where s* and s| are two independent sample estimates of 
respectively. Then 


02 


and 


*!» 
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This function provides a test of the hypothesis o-f == cr| = <r 2 . Then 



which is the ratio of the independent sample estimates of two assumed 
equal variances. 

A tabular value, F a) is obtained for n i and n 2 degrees of freedom, such 
that 

<P(F > F a ) = /‘ " f(F ) dF = a. 

J^OL 

a is the probability that a value of F as large as or larger than F« could 
have been obtained from two random samples from two normal popula¬ 
tions with the same variances. For a fixed a, Fo is compared with F a . 
If F o is greater than F a , this is evidence that <r 2 > <t 2 . If F 0 is less than 
F a , there is insufficient evidence to state that <r 2 > <r 2 . It should be 
reemphasized that there is always a chance of being wrong in stating that 
ff i > a\ when F 0 > F a . If we make a large number of such tests, we 
would expect to be wrong less than 100a per cent of the time. If we 
further protect ourselves from such mistakes by decreasing a, we shall 
unfortunately reduce the number of times real differences are indicated. 
A further discussion of this problem will be taken up in Chap. 11. 

In the above discussions we have been assuming that the only alterna¬ 
tive to o\ = g\ is that cr 2 > <rf. Of course, even if cr\ > erf, the sample Fo 
might be less than 1. Under certain conditions of an applied problem it 
may not be valid to assume that the only alternative to the hypothesis 
a i = If <?i may also be less than erf, then the alternative 

hypothesis is <r 2 ^ erf, and it becomes necessary to consider both tails 
of the F distribution in testing the hypothesis. In this case, we compute 
Fo as the larger mean square divided by the smaller and compare it with 
Fa /2 instead of F«, if we wish to test at the a significance probability level. 
Or, conversely, we can use F« as the tabular value and test at the 2a 
probability level. 

In Sec. 7.11 it was shown that t may be used to test the hypothesis that 
Mi = M 2 on the assumption that = o\. Suppose that for the two sam¬ 
ples {Xiij and {X 2 4 there is some reason to believe that the population 
variances are not equal but that there is no a priori reason for one to be 
larger than the other. In this case it is necessary to use the two-tailed 
F test in testing for the significance of the difference between the two 
variances. On the other hand, in applying the F test to an analysis-of- 
variance table, to be explained later, the usual alternative hypothesis is 
an d a single-tailed F test is used. 
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Tabular values of F . 05 and F . 01 are presented in Table IY in the Appen¬ 
dix for various values of n\ and n 2 . Norton has computed values of 
F. 20 and Colcord and Deming of F 00 i. All of these tables may be found 
in reference 14. See Bancroft 2 for a complete description of where such 
tables can be found. 

EXERCISES 

7.1. ( a ) Set up the joint probability distribution of the sample of size 
n from a bivariate normal distribution (X 1} X 2 ) with means gi and p 2 , 
variances uf and cr| ? and. correlation p. ( b ) Show that the simultaneous 
distribution of X\ and X 2 is 


where 


_ n 

‘2/k<j ict 2 v 7 ! ~ p 2 


e- 0 dX 1 dX 2 


Q = 


2(1 - p 2 ) 


(Xi — ^i) : 


(X 2 -m 2 ) 2 2p(X 1 -p 1 )(X 2 - j u 2 )' 

2 - ' ~~ . 

°2 iff 2 


°° < Xi < 


— co < X 2 < co . 


7.2. Given a normal population with <r 2 = 25. A sample of 12 pro¬ 
duced the following values: 24 ; 30, 26, 33, 21, 24, 20, 32, 24, 30, 33, 34. 

(a) Test the hypothesis that the true mean of the population is 24, 
using the a = .05 significance probability. 

(b) Make the same test as (a) if a = .01. 

What is the difference between the two tests? 

(c) Set up a 95 per cent confidence interval for the true mean. 

(d) What is the connection between the test of (a) and the confidence 
interval of (c)? 

7.3. Given the distribution of x 2 as in the Sec. 7.5, find the distribution 


y = 


1 + x 2 ’ 


7.4. Given the two functions 


W = dixl + C&2X2, 

f = x? + xl. 

Find the distribution of w — u/v. What effect would the provision 
di > a 2 > 0 have on the results? 

7.5. If Xi and X 2 are NID(0,1), show that the variables r and 6 defined 
as follows, 

Xi = r sin 0, X 2 = r cos 0, 

are independently distributed and that r 2 is distributed as x 2 with 2 
degrees of freedom. 
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7 . 6 . Given the bivariate normal distribution with means zero and 
variances cr\ and a\ and correlation p. Show that 

. 1 {XI XI _ 2 P X 1 X 2 \ 

6 1 - p 2 Uf ^ a\ 2 ) 

is distributed as chi-square with 2 degrees of freedom. Hint. Show that 
the moment generating function of 8 is the same as that of x with 2 
degrees of freedom. 

7 . 7 . Using the result in Exercise 7.6, find the probability that 8 < c 2 . 

7 . 8 . In Exercise 7.6 let ( 7 2 = a 2 2 = 1 and p = 0. Find the equation of 
the circle with center at the origin which would contain 99 per cent of a 
large number of samples of Xi and X 2 . 

7.9. A random sample of n individuals {X;} is drawn from a X(0,1) 
population. Suppose the sample is subdivided into r subclasses with 
two or more in each (n h n 2 , . . . ,n r being the numbers in the subclasses). 
The sum of squares qi (i = 1,2, ...,?*) of deviations from the subclass 

r 

mean is computed for each subclass. What is the distribution of ^ 7*? 

i=l 

7 . 10 . Compute the 1 per cent and 5 per cent tabular values of x 2 for 
v = 1 ; 2, and 4. Check with the results in Table II in the Appendix. 

7 . 11 . Given (V) 2 A 2 = 2 with n = 4. What is the probability of 
obtaining a value this large or larger? Check your results by interpolat¬ 
ing with the tabled values. 

7 . 12 . Use the data in Exercise 7.2 to test the hypothesis that o- 2 = 25, 
assuming ju = 30 and ot = .05. Also set up 95 per cent confidence 
limits for <r 2 . 

7 . 13 . t (a) Show that the results of Sec. 7.7 can now be extended to 
test hypotheses and to set up confidence limits for <r 2 without assuming 
p knowuq where s 2 replaces ( s ') 2 . 

(6) Use the data in Example 7.1 to test the hypothesis that <r 2 = 90, 
when no assumption is made concerning p. Set up 95 per cent confidence 
limits for <r 2 . 

(c) Repeat Exercise 7.12 when no assumption is made concerning p. 

(d) What is the advantage of the use of s 2 instead of (s') 2 i n making 
statements concerning <r 2 ? 

7 . 14 . Show that the distribution of s 2 approaches normality for large n. 
What are the skewness and kurtosis of s 2 ? 

7 . 16 . Find the moment generating function of log, s, where s 2 is an 
unbiased estimate of variance. Show that the distribution of log e s is 
independent of <r 2 apart from its mean value. 

f Exercise 7.13 should be worked by everyone. 
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n 

7 . 16 . Given s 2 /V 2 = 2 with n = 5, where s 2 = ^ (X; - X) 2 /(n - 1). 

i = 1 

What is the probability of obtaining a value this large or larger? Check 
your results by interpolating with the tabular values. 

7 . 17 . The distribution of t for 1 degree of freedom is the so-called 
* 1 Cauchy distribution.” What about its mean and variance. Com¬ 
pute the tabular values of this t for a - .01 and .05, using the two-tailed 

test. 4 

7 . 18 . (a) Given a random sample of n paired observations (A and /> ). 

The sample mean of the A’s is X a , and that of the B’ s is X b . The A’s 
and B’s come from normal populations with the same variance. It is 
desired to test whether the population means are equal. The value of 
the ith members of A and B may be represented respectively, as 

Ai = \i a + Vi + e™, 

Bi = w + Vi + 

where Ma and ii h are the population mean effects of A and B, p t - represents 
the effect common to the two paired observations, assumed X(0 ,cj 2 ). 
The e a i and eu represent residual effects for each observation not explained 
by the p’ s and pi and independent of them. The e a % and e bl are assumed 

X(0,<7 2 ). 


If 


= 


n 

l 


n 

i 


Bi 


Xr 


n 


n 


and 


o i = 1 

S Z — - 


^ [(Ai - Bi) - (X a - X,)]- 


n 


- 1 


show that 


(Xa - Xb) 


follows the t distribution with (n - 1) degrees of freedom. 

(b) Compare s 2 in (a) with that obtained by pooling the two sample 
sums of squares of deviations. 

7 . 19 . Show that the t distribution approaches normality for large n. 

7 . 20 . The distribution of the correlation coefficient r in samples of n 
from a normal bivariate population with p = 0 is 
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f(r) dr = — j 1 (1 - r ! )C*-«/3 dr, -1 < r < 1. 

Show that 

r EI 
Vi - r 2 

is distributed as if with (n — 2) degrees of freedom. 

7 . 21 . Compute the 5 per cent tabular value of t (two-tailed test) for 
2 degrees of freedom. What is the probability that \t\ >3? 

7 . 22 . In a certain test, given to the 45 members of a class in statistics, 
20 women students had an average score of 40 with a variance of 16, 
while 25 men students had an average of 46 with a variance of 16. What 
is the probability of obtaining such results if both groups were equally 
well prepared for the test? What assumptions are made in obtaining 
this probability value? On the basis of this sample and the validity of 
the assumptions is there much evidence that both groups are not equally 
well prepared? What are the 95 per cent confidence limits for the dif¬ 
ference between the two means? 

7 . 23 . Given the following data on the gains (in pounds) by 27 hogs, in 

individual and similar pens, 12 fed ration A and 15 ration B. Test the 

hypothesis that the population mean gains are the same, and set up 
99 per cent confidence limits for the difference between the two means. 

A: 25, 30, 28, 34, 24, 25, 13, 32, 24, 30, 31, 35 

B: 44, 34, 22, 8, 47, 31, 40, 30, 32, 35, 18, 21, 35, 29, 22 

7 . 24 . ^ In a certain school 34 girl beginners in the first grade were selected 
and paired on the basis of IQ, socioeconomic rating of family, general 
health, and family size. One member of each pair had attended one 
year of kindergarten, while the other had not. On a certain first-grade 
readiness test given to all 34 pupils the scores were: 

Kindergarten 65 68 70 63 64 62 73 75 72 78 64 73 79 80 67 74 82 

No kindergarten 63 68 68 60 65 60 72 75 73 70 66 70 77 78 63 74 78 

Is there significant evidence from this investigation that kindergarten is 
of benefit in preparing for the first grade? In making use of the t test 
what null hypothesis is made concerning M = Ml - M2 ? How many 

degrees of freedom do you use here? What must be true concerning 

the distribution of the paired differences? Set up 95 per cent confidence 
limits for ju. 

7 . 25 . Cochran 13 presents an example of an experiment with a control 
and six chalk and lime treatments on the number of mangolds per plot. 
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Suppose we compare the control numbers with those having a large 
amount of chalk or lime. The numbers of mangolds per plot were: 


Total 


Control 

140, 142, 36, 129, 49, 37, 114, 125 

772 

Treated 

117, 137, 137, 143, 130, 112, 130, 121 

1027 


(a) Calculate the sample mean and variance for each group. 

(b) Assume that the two population variances are equal, and test for 
the difference between the two population means. 

(c) Assume that the two population variances are unequal. Test for 
the difference between the two population means by the Cochran a n d 
Cox, Smith-Satterthwaite, and Dixon and Massey tests. 

7 . 26 . Show that in Sec. 7.12 

r (ni/n 2 ) ni/2 
0 “ B{m/2,n 2 /2) 

7 . 27 . Show that 

l“m*F = i- 


7 . 28 . When m = 1, show that F = t 2 . 

7 . 29 . Let F(ni,n 2 ) be the distribution of F with n i degrees of freedom 
for the numerator and n 2 for the denominator, (a) Derive the distribu¬ 
tion of 

F' = F(w2 ’ n,) = F(n h rH) 

(b) Hence show that 


(P 


F(n 2j nt) < y 


= (P[F(ni,n 2 ) > X]. 


(c) Use this result to show why we can use F a as a tabular value to 
test at the 2a probability level for a two-tailed F test. 

7 . 30 . Show that <?(F > F a ) is an incomplete Beta function. Set up 


this function. Hint : Let X 


n iF/n 2 
1 + (niF/n 2 ) 


7 . 31 . Use the result in Exercise 7.30 to determine a general formula 


for in of F. 

7 . 32 . Determine the 5 per cent tabular value for F if (a) m = 2,n 2 = 4, 
and (6) n\ — 4, n 2 = 4. 

7 . 33 . What is (P(F > 4) for n x = 4, n 2 = 4? 

7 . 34 . Given = 40 and s* = 10, each based on 4 degrees of freedom. 
Determine the probability that sample variances as divergent as these 
could be estimates of the sample population variance with an alternative 
hypothesis that e\ <r\* 
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7 . 35 . Given that one mean square, 80, with 4 degrees of freedom, is an 
estimate of c i\ + 5af, while another mean square, 10, with 20 degrees of 
freedom, is an estimate of a\. Use an F table to test the hypothesis that 
<rf = 0. 

. Test the hypothesis that the two population variances in Exer¬ 
cise 7.25 were equal. 

7 . 37 . R. A. Fisher first considered the problem of the ratio of two 
variances as the difference between the logarithms of the two variances. 
If we let 

Z = i loge F, 

we obtain his z distribution, which is more nearly normal than the F 
distribution. Show that 

\— (ni+n 2 )/2 

1 +^e 2 'J dz, -»<«<«,. 

7 . 38 . Show that f(z) is symmetrical if ni — n 2 . 

References Cited 

1. Pearson, Karl, Tables of the Incomplete V function , H.M. Stationery Office 

London, 1922. ’ 

2. Bancroft, T. A., “Probability Values for the Common Tests of Hypotheses ” 

J. Am. Stat. Assoc., 45:211-217 (1950). ’ 

3. Thompson, C. M., “Tables of Percentage Points of the Incomplete Beta Function 

and of the Chi-square Distribution,” Biometrika, 32:151-181, 187-191(1941). 

4. Bartlett, M. S., “Complete Simultaneous Fiducial Distribution,” Ann. Math 

Stat., 10:129#. (1939). 

5. Bartlett, M. S., “The Information Available in Small Samples,” Proc. Cambridae 

Phil. Soc., 32:560#. (1936). 

6. Welch, B. L., “Significance of the Difference between Two Means When the 

Population Variances Are Unequal,” Biometrika, 29:350-362 (1937). 

7. Fisher, R. A., “The Fiducial Agreement in Statistical Inference,” Ann. Euqenics 

6:391-398 (1935). ’ 

8. Sukh^atme, P. V., “On the Fisher-Behrens Test of Significance for the Difference 

in Means of Two Normal Samples,” Sankhya, 4:39#. (1938). 

9. Cochran, W. G., and G. M. Cox, Experimental Designs, John Wiley & Sons Inc 

New York, 1950. ’ 

10. Smith, H. F., “The Problem of Comparing the Results of Two Experiments with 

Unequal Errors.” J. Sci, Ind. Research {India), 9:211-212 (1936). 

11. Satterthwaite, F. E., “An Approximate Distribution of Estimates of Variance 

Components,” Biometrics Bull, 2:110-114 (1946). 

12. Dixon, W. J., and F. J. Massey, Jr., Introduction to Statistical Analysis, McGraw- 

Hill Book Company, Inc., New York, 1951. 

13. Cochran, W. G., “Some Consequences When the Assumptions for the Analysis 

of Variance Are Not Satisfied,” Biometrics, 3:22-38 (1947). 

14. Fisher, R. A., and F. Yates, Statistical Tables, Oliver & Boyd., Ltd., Edinburgh 

and London, 1938. 





CHAPTER 8 


INTRODUCTION TO POINT ESTIMATION AND CRITERIA 
OF “GOODNESS” 

8 . 1 . Introduction. A worker in animal husbandry may wish to 
estimate the mean gain in weight of swine, of the same breed,, sex, and 
age, fed on the same ration and managed similarly for a period of 20 
days To this end the following sample set of gains in pounds was 
obtained: 27, 28, 28, 29, 29, 29, 30, 30, 30, 30, 31, 31, 31, 32, 32, 33. Now, 
if it be assumed that these observations are a random sample from a 
population of such gains which are normally distributed with mean, 
p then we must decide what function of the sample observations must be 
used to obtain a “good” or “best” estimate of We must, of course, 
define “ good ” or “ best ” in some technical sense. As will be shown later, 
it turns out that the sample mean, X = 30, is a.“good” estimate of /x. 
Our investigation concerned itself then with drawing a sample from some 
population of specified mathematical form for the purpose of estimating 
a parameter of the population. In subsequent discussions the function 
of the observations chosen to estimate the population parameter,, m 
this case X, will be called an estimator , while the particular numerical 
value obtained in an application will be called an estimate . ^ > 

8 . 2 . The Problem of Point Estimation. A sample X h X 2 , . . . AN is 
specified as having been drawn from a common population with distribu¬ 
tion function /(X; 0 lt O 2 , • • • A), where Z is the variate and 0i,^ 2 
e k are population parameters. We wish to find functions of the observa¬ 
tions, say 6i(Xi,X 2 , . . • ,X n ), h (X 1} X 2 , . • • ,Xn), • • . , 0 k (X h X 2 , 
.... ,X n ), such that the distribution of these functions will be concen¬ 
trated closely about the respective true values of the parameters. By 
saying that the distribution of (h(Xi,X 2 , . . . ,X n ) will be. concentrated 
closely about the true value we can mean one of several conditions, such as 

(a) The probability that the estimator falls within a short distance of 
the true value shall be large regardless of the fact that this requirement 
may be satisfied only by an estimator which is distributed in such a way 
that there is a possibility (though small) of a very large deviation from 

the parameter. , 

(b) The probability that the estimator falls more than a specified dis- 
tance from the parameter shall be negligible, or even zero, regardless of 
how the estimator is distributed inside this region. 
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(c) Or we may be willing to have the estimator deviate in one direction 
from the parameter but may wish to minimize the change of large devia¬ 
tions in the other direction. 

These three conditions may be represented by estimators which are dis¬ 
tributed as shown in Fig. 8.1 (6 is the parameter). 

There are other considerations which might influence one in deciding 
whether or not a given estimator is a “good” one. For example, is the 
estimator appropriate for small samples as well as large? It also' might 
be possible to set down a cost of a given deviation of an estimator from 
the parameter, the cost presumably being an increasing function of the 
size of the deviation. Then, if we could determine this cost function for 


/( 0 ) 



various proposed estimators, it would be possible to estimate the total 
cost for each and adopt that estimator which produced a minimum cost. 

Most of these problems of estimation are resolved if all the estimators 
are normally distributed. In this case, the best estimator can reasonably 
be assumed to be that one which has the minimum variance. However, 
it is difficult to say which estimator is superior if one is distributed nor¬ 
mally and the other uniformly, for example. Although most common 
estimators are asymptotically normally distributed (n large), few are 
normal for small samples and many are far from normal in this case. 
On the whole we can advance only a few guiding principles to be used in 
deciding whether an estimator is “good” or not. While these principles 
have yielded fruitful results, much work remains to be done, especially 
regarding nonnormally distributed estimators and nonrandom samples. 
^ In the following sections we shall define certain characteristics of a 
“good” estimator and introduce a method which sometimes yields an 
estimator which satisfies all of these characteristics. Much of the phi¬ 
losophy and theory of estimation are a result of the work of R. A. Fisher 
(see reference 1). 
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8.3. Unbiasedness. An estimator 8 is said to be unbiased if E(6) — 8. 
It is said to be positively or negatively biased, respectively, according as 
to whether E(8) ^ 8. Unbiasedness is a desirable but not necessarily an 
indispensable property of a “good” estimator. If the amount of bias is 
small compared with the standard deviation of the estimator, the esti¬ 
mator though biased may be entirely satisfactory. 

8.4. Consistency. An estimator 8 is said to be a consistent estimator 
of 8 if 8 converges stochastically to 8 as n approaches 00 . Symbolically 
8 will converge stochastically to 8 as n approaches °o, if for two arbitrarily 


fiX) 



small positive numbers, e and 17 , a large enough sample can be taken so 
that 

<P[|0 - 0| > e] < 1 ?. 

A useful relation for proving consistency is furnished by Tchebysheff s 
inequality: Given a random variate X with distribution function /(X), 
mean y, and variance <r 2 assumed to exist, then for a given 5(>0) 

<P[|X - Ml > M < } 2 - 

The inequality may be established from Fig. 8.2. It follows that 

(x - m ) 2 /(X) dx 

= (x - m) 2 /(X) dx + (x - »mx) dx 

+ f" (X-riV(X)dX. 

J/X+So- 

Also, since the second integral is positive, 

(t 2 > T'* (X - m) 2 /(V dX + [‘ (X - m) 2 /(X) dX. 

J — co JIJ--T 0<7 

Since, in the range of integration, \X — y\ > 5<r, the factor (X — y) 2 
be replaced by 5V 2 and 

«r 2 > 5 V 2 fN*f(X) dX + 5V 2 jJijeX) dX ■ 
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It is now evident that 

~ > (P(|X - g| > da), 

or 

(P[|X - g| > 8a] < y 2 . 

It should be noted that this inequality holds for any distribution but is 
not very efficient for a normal distribution. For example, if X is N(g,<r 2 ), 
then the exact probability, a, for 8 = 2 is .05, while TchebyschefFs 
inequality gives a < .25. 

In the limit a consistent estimator will necessarily be unbiased although 
for finite sample sizes the consistent estimator may be biased. An 
unbiased estimator may or may not be consistent. It can be proved 
(Cramer 2 ) that an unbiased estimator will be consistent if a 2 % -» 0 as n —> qo . 
Note that consistency is a large sample criterion, while bias is applied to 
small samples as well. 

There are usually many consistent estimators of d; hence, other criteria 
are needed to select the “best” from among those with the property of 
consistency. In general, consistency is a desirable property of an 
estimator. 

Example 8.1. Since E(X) = g, the sample mean, X, is an unbiased 
estimator of the population mean, g. Also, if <r 2 exists, d 2 * = a 2 /n and 
then, by Tchebysheff’s inequality, 


(P 


8a 

* - 'I > Vi I 



Now, if we set e — 8a/ \/ n and rj = 1/8 2 , then X will converge stochasti¬ 
cally to fi, if we choose n > 8 2 a 2 / e 2 . Hence X is a consistent estimator for fi. 

Example 8.2. It can be shown that the distribution of the median, 
m, for a random sample of n from N(v,a 2 ) approaches N(ii,a 2 m ) for n large, 
with cr 2 = ira 2 /2n. Hence, for large n, m may be taken as unbiased and 
is also a consistent estimator of g for normally distributed data. [It can 
be shown in general that al = 1/4 nf 2 (m) if f(m) ^ 0. a 2 m is the asymptotic 
variance.] 

8.5. Efficiency. Since there are usually many consistent estimators 
of a given parameter 6, we require some additional criterion to use in 
deciding which of these consistent estimators is the “best.” Also, there 
might be cases in which it is desirable to use an estimator for which the 
limiting value deviates by a small amount from the parameter, that is, an 
inconsistent estimator, if such an estimator were superior in other 
respects. Another criterion advanced by R. A. Fisher is that the esti- 
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mator shall have a minimum variance in large samples. An estimator 
possessing this property is said to be efficient. 

Definition. 6 is an efficient estimator of B (i) if \/n ( § - d) approaches 
N(0,a 2 ) as n— > , (h) for any other estimator B r for which \fn {fi r — 0) 
approaches X(0,cr /2 ), <r' 2 > a 2 . 

The efficiency of d' is given by E f = <r 2 /V 2 .t We shall consider later 
an estimation method which provides an estimator with minimum 

variance. . 

Example 8.3. Consider two consistent estimators of p furnished by a 
random sample of n taken from N(y,a 2 ), that is, the mean X, and the 
median, m. We know that <rf = <r 2 /n and also for large samples that 
— 7ro - 2 /2n. The efficiency of the median relative to the mean is given 

by 

E f = 


= 64 

ira 2 /2n 


Example 8.4. Consider the mid-range, MR, as an estimator of 
p. MR is the average of the largest X and the smallest X in the sample. 
It can be shown that if the n members of the sample {XT} are N(y,a 2 ) } 
then E( MR) = p and 




+ 0 


(log 2 n) 


24 log n 

Hence, the efficiency of MR using the first term only of oi R relatively to 
the mean, X, is given by 

F = n 
Ejf ttV 2 /24 log n 

But, lim E f = 0. Hence, although MR is an unbiased and consistent 

estimator for p, it is decidedly inefficient in large samples as compared 
with the mean, X. 

Example 8.5. Consider the efficiencies of X, m, and MR as estimators 
of ju from a sample of n from the rectangular distribution :/(X) - 1/w, 
0 < x < w, w < °o ; ju = w/2, ando- 2 = w 2 / 12. All three estimators are 
unbiased. The variances are 


4 = 


w 


n 


Yin 


?x 2 , 

n 


4r ~ 


6 ( 7 2 


(wTlKw +"2) 

Hence each estimator is consistent. If MR were asymptotically normally 
distributed, we could infer that the efficiency, E f , of the mean or the 
median relative to mid-range would be zero. However, MR is not so 

f We shall use the symbol E f to represent efficiency to distinguish it from E, which 
stands for expected value of. 
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distributed: hence, we cannot use the criterion of efficiency in making such 
a comparison. The efficiency of m relative to X is i. 

8.6. Sufficiency. Another criterion which is useful for small samples 

is sufficiency. 

Definition. . 6 is said to be a sufficient estimator for estimating B if the 
joint distribution of the sample {X*} of n observations may be put in the form 

n 

n ) = g(X 1 ,X S) . . , ,X n \e) ■h(§;8 ), 

i = 1 

where g(X i,X 2 , . . . ,X n \B) does not involve B. 

In words, it must be possible to subdivide the joint distribution of the 
sample into the conditional distribution of (Xi,X 2 , . . . ,X n ) given B 
multiplied by the joint distribution of B and B in such a way that the condi¬ 
tional distribution does not involve B. 

In the factored form it is easy to see that no other estimator, say B', can 
provide any information about B. For the distribution of B' for a fixed B 
will be determined by g(Xi ; X 2} . . . ,X n | B) and will involve B but not B. 
Hence B' will provide information about B but not B. But, for any given 
problem, B is known, so that this information is of no value. In the 
language of R. A. Fisher, a sufficient estimator is one which exhausts all 
of the information in the sample. 

It should be emphasized that sufficiency is not an asymptotic property; 
it does not require that n be increased without limit or that B be dis¬ 
tributed normally for large n. In view of the above discussion it would 
appear appropriate to consider a sufficient estimator as the “best”esti¬ 
mator. Certainly if an estimator of B is not a function of B, it can be of no 
use in estimating B. Conversely, if an estimator exhausts all the informa¬ 
tion in a sample, it seems useless to consider any other estimator. Unfor¬ 
tunately there is no sufficient estimator for many parameters. 

8.7. Amount of Information and a Measure of Efficiency for Small 
Samples. The minimum variance for B is given by 



where f(X;B) has been abbreviated to/. This minimum variance can be 
achieved only if B is an unbiased sufficient estimator of B and if 


d log h(B;B) 
dB 


= k(B - B), 


where k may be a function of B but is independent of B , and h(B;B) > 0. 
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R. A. Fisher has designated the reciprocal of cr§, mm 2 the amount of infor¬ 
mation in the sample of n , that is, 

Now, if § is the estimator with minimum variance, then the efficiency of 
any other estimator will be measured by 


If the estimator B is asymptotically normally distributed so that 

(B - B) 2 


log [h(6;B)] = constant 


2crf 


then 


d log h _ B — B 
dB (T§ 2 


Hence, in this case the second requirement for minimum variance is met 
with k = 1 /erg 2 , provided ae 2 exists. 

8.8. Other Forms of I. We can write the integrand of I in several 
additional ways. Since 

d log / _ 1 df 
dB f de’ 

the integrand may be written as 



Also, 

3 2 log/_ 1 Off _ lfjtfY 
dB 2 / dB 2 f 2 \dB/ 

Hence 

-» /;. (% ? )' ■* - * ? ($' 

But the last integral on the right is zero; hence 



Example 8.6. Consider a random sample of n from N(n,(r 2 ). The 
joint frequency distribution is 
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ft KXAM,^) = 
1 = 1 




Since 


(\Z2ir(r) n 

™ n 

l (X, - m) 2 = y (X, - X) 2 + n(X - /i) 2 , 
»-l 


then 


n /w^*) = 


i (V 2 ^)” 

Then we may make the identifications: 


1 -g^g-jQ 8 _h Vn «... 


2 ( 7-2 


and 


A-(M) = h{X\n) = g-ncx-^a/^* 


0(Xi,X 8) . . . ,X„|0) =g(X h X 2 , . . . ,X„|X) = 

But the function g is independent of M ; hence, X is a sufficient estimator 

01 jU. 

Also since 

then 


log [/(X; M )] = log (<r V&) - 

2cr 2 


7 = 


(X - m) ; 


/(X ;m) dX 


J — <x> <J (J 

Hence the minimum variance is o- 2 /n, which is the variance of X. 

Example 8.7. Suppose that the sample of n is taken from a Poisson 
population so that 


/(X ;m) = <r 


XI 


The probability of obtaining the sample, that is, the joint frequency 
function, is 


[] f(X { ;m) = e~ nm m^i-i Xi 1 

. ^ n 

1 = 1 

n x t i 

Since we may set 

h(X;m) ~ (m f e~ m ) n } 

n 

^ ~ Xi/v, is a sufficient estimator for m. Also, X is unbiased, 
since E(X) = f E{X t )/n = m. Again, sii 


since 


i = 1 
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/ = » | WxXUv™) 


m " 


n 

m 


v = o 


Hence 


the minimum variance is m/n, which is the variance of X. 


EXERCISES 

8.1. Show that (s ') 2 = 2(X - X) 2 /n is a biased estimator for <r 2 . 

8.2. Given a random sample of n from N( 0,tr 2 ). 

(a) Show that s 2 = 2 X 2 /n is an unbiased estimator for <r 2 . Hint: 
s 2 = (<r 2 /n)x 2 - 

(b) Show that cr 2 2 = 2a A /n. Hint: 

<T*. = E[{S 2 _ ^2)2] = £>[(*2)2] - (cr 2 ) 2 . 

Use X[(s 2 ) 2 ] = (cr 4 /n 2 )X[(x 2 ) 2 ] to complete the demonstration. 

(c) Is s 2 also consistent? Hint: Use the results of (b) and Tcheby- 
sheffis inequality. 

(d) Using the concept of information, I , show that the minimum vari¬ 
ance of s 2 is 2cr 4 /n. Is s 2 efficient in small samples? 

(e) Is s 2 a sufficient estimator for <x 2 ? 

8 . 3 . (a) Show that h(X;m) for a sample from a Poisson distribution 
meets the conditions given for minimum variance. Hint: From Example 

8.7, h(X;m) = Now, show that = £ {X - m). 

(I b ) Is X consistent in this case? 

8.4. Suppose N experiments consisting of n trials each are conducted 
with data assumed binomially distributed with constant probability p. 
That is, 

/(X;p) = Ctn,X)v x q n ~ x , 0 < X < n. 

We estimate p by X = ZX/Nn. Discuss this estimator as to: 

(a) Sufficiency, by showing that h(X)p) = [p x (l ~ p) 1 x ] Nn . 

(b) Bias. 

(c) Information, by showing I = Nn/pq. 

(d) Consistency, by using a\ = pq/Nn and Tchebysheff’s inequality. 

8.5. Given a sample of n taken from the Cauchy distribution 

/(X; M ) = 4T-+ (X - pp] 

(a) It can be shown that X has the same distribution as X. . Hence 
what can be said about the usefulness of X as an estimate of a in this case? 

( b ) Using the concept of information, show that the minimum variance 

of A is 2/n. 
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. ^ Show that the asymptotic variance of the median for this distribu- 

tion is 7r 2 / An. What is its asymptotic efficiency? 

(d) Is the median consistent in this case? 

8 . 6 . It. A. Fisher has shown that 



where s 2 - 2(X - ty/(n - 1) and the sample is from an arbitrary 
universe. Show that s 2 is a consistent estimator for a- 2 . 
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CHAPTER 9 


PRINCIPLES OF POINT ESTIMATION: 

MAXIMUM LIKELIHOOD 

9.1. Introduction. Several principles of estimation, leading to routine 
mathematical procedures, have been proposed for obtaining “good” 
estimators. These include the principle of moments , minimum chi-square , 
the method of least squares , and the principle of maximum likelihood. 
The application of these principles in particular cases will lead to esti¬ 
mators which may differ and hence possess different attributes of “good¬ 
ness.” A principle much in use, yielding estimators with many desirable 
attributes of “goodness” obtained by easily applied routine mathematical 
procedures, is that of maximum likelihood devised by R. A. Fisher. In 
this chapter we shall discuss this important principle of estimation m 
detail. 

9.2. Basic Theory. The procedure for determining the maximum- 
likelihood (ML) estimator for a population parameter 0 is as follows; 

(a) Determine the distribution function of the sample, f{Xi,Xi, ... , 
X„;9). This is the probability of obtaining the particular sample for 
discrete variates and is this probability without the differentials for con¬ 
tinuous variates. R. A. Fisher has called this the likelihood of the 
sample. 

(b) Determine L = log/(Xi,X 2 , • • . ,X n ;0). 

(c) Determine the value of B which will maximize L by solving the 
equation 


This will also maximize the likelihood. From the previous chapter we 
recall that a sufficient estimator for 6 exists when the joint distribution 
of the sample may be put in the form 

/(Xi,X 2 , • • • ,X„;0) = ?(Xi,X 2 , . . • ,X„| eyh(B]B), 

where £(Xi,X 2 , . . . ,X„)|0) does not involve 9. Hence the ML equation 
reduces to 

d log [h(B;B)] _ ~ 
dd 
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But any solution of this equation can depend only on 0 ; hence, this ML 
solution must be the sufficient estimator, as it depends on no other 
estimator. 

9.3. Maximum-likelihood and Efficient Estimators in Small Samples. 

If an efficient estimator for small samples exists (has minimum variance), 
the ML estimator, adjusted for bias if necessary, will be efficient. 

In order that 0 have minimum variance, we know from the previous 
chapter that 0 must be an unbiased sufficient estimator and also that 


me-M 

dd 


= k(6 - 6). 


But to obtain the ML estimator we set this, or what amounts to the same 
thing, 

d log [h(0;0)] 

66 * 

Hence, 6, the unbiased ML estimator, is an efficient estimator in small 
samples. It follows from the previous chapter that the minimum vari¬ 
ance that an unbiased ML estimator can have is 1/1(6). If the ML 
estimator, 0, is biased in such a way that E(6 ) = $ -J- b(6) } the minimum 
variance is 

0 + w™ 

If the ML estimator is biased, we may adjust it for bias and the minimum 
variance of the adjusted 6 will be 1/1(6). 

9.4. Maximum-likelihood Estimators for Two or More Parameters. 
If there are h parameters for which estimators are desired, we solve the 
set of h equations 

d6i = 0’ i = 1, 2, h. 


In the case of two parameters, 6i and 02 , Cramer 1 has shown that the 
minimum variance for 0* is 


where 


p 2 = n 2 


i - p my 

The joint efficiency in small samples is given by Cramer 1 as 

1 1 
(1 - p 2 ) 2 ' /(ffi) I(e,)ai4 
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The asymptotic efficiency E f is given by the limiting value of this joint 
small-sample efficiency as n —> <*>. Similar formulas may be derived 
for h > 2 . 

9.5. Examples Using the Principle of Maximum Likelihood 

Example 9.1. Consider a random sample of n for N(^,ct 2 ). In this 
case 


L = log f(X h X 2 , . . . ,Xn]W 2 ) 


1 

2 


£ - m ) 2 

n(log 2ir + log <r 2 ) + > = 1 9 , -- 


Let us study the following three cases: 

(a) cr 2 known 

n 

y 

1 = 0: l (X i -A)=0, = V 

t- 1 

and 

2 

ax 2 = ~ (the minimum variance). 

(i b ) known 

n 

y (V - m ) 2 

_aL_ = 0 . n = A_ 

d (°- 2 ) (?) 2 

n 

y (v - m ) 2 

( 7 2 = -, 

n 

2 cr 4 

cr ^2 _ — (also the minimum variance). 


2 (V - xy 

Note that s 2 = —--- in this case is not completely efficient in 

n — 1 


small samples since 


d S 2 


2 


(c) Both ju and a 2 unknown 


P ~ X, a 2 


2 d 4 

n — 1 

n 

y (v - x ) 2 

. A_ 

n 
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In case c, a 2 is biased, and we would use s 2 as the unbiased estimator for 
< 7 2 . It has previously been shown that X and s 2 are independently dis¬ 
tributed. The two estimators X and s 2 are sufficient since 

f(X h X h . . . ,X n 'w*) = g(X h X,, . . . ,X B ) -hi&w*) 'h 2 (s 2 ;g,<T 2 ), 

where g(Xi,X 2 , . • . ,X n ) is independent of p and a 2 (in this case, also 
independent of X and s 2 ). 

We see that X is still efficient in small samples with ax 2 = <r 2 /n. How¬ 
ever, s 2 is not quite efficient, since <t s * 2 = 2 a 4 /(n — 1) and the minimum 
variance in small samples is still 2a A /n (X and s 2 are independently dis¬ 
tributed). Hence the efficiency of s 2 is (n — l)/n, which approaches 
1 as n— » oo. Th q joint small-sample efficiency is also (n — 1 )/n. 

To show that the minimum variance of s 2 is l//(<7 2 ), consider the 
following: 

E0) = (n - 1) - = cr 2 - -• 

v n n 

Then <j 2 — ( a 2 /n ) is of the form 6 + 5(0), with db(d)/dd — —1/n. But 
s 2 = n?/(n — 1), and hence the minimum variance of s 2 is 

l iv. (^i) _ i . 

V 1 n} !(«*) I (o' 2 ) 

Example 9.2. Given N random samples consisting of n trials each 
from a population assumed binomially distributed with constant proba¬ 
bility, p, so that 

f(x;p) = C(n,x)p x { 1 - p) n ~ x . 

Then, 


/(* 1,352, • • • ,Xn',v) = n C(n,Xi)p x ”-' x, ( 1 - p) 2 ‘^- l(n x<) 


i = 1 


and 


Hence 


N 


N 


N 


= 7 log C(n,x { ) + ^ Xi log p + ^ ( n ~ log (1 ~ V)- 

i =1 1 i =1 


N 


dL n * i = i 

T V =0:p 


^ Xi ^ Xi 

■ = 1 * = 1 


» Nn 

) (n — xi) + > Xi 

i = 1 i = 1 


= X. 
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EXERCISES 

9.1. Determine the ML estimator for m in the Poisson distribution. 

9.2. Given the Pearson Type III distribution 

f(X ;«) - —Uj 0 < X < oo, 

where X is some fixed positive constant, (a) Show that the ML estimator 
for a for a sample of n is & = X/\. ( b ) It can be shown that <r| = <r 2 /n\. 

Using the concept of information, show that the minimum variance is also 
given by a 2 /n\. (c) Is the estimator consistent? (d) Is the estimator 

efficient in small samples? (e) Show that E(X) = a\, hence that 
E(d) = a and a is unbiased. (/) Show that 

n 

Jj XA -1 • 

f(Xi,Xi, . . . jXnja) — [r(x)F^ 

and hence & is a sufficient estimator for a. 

9.3. Derive for random samples of n the ML estimators of ju, (3, and cr 2 

in the regression equation Y = /i + fix + e, where e is iV(0,cr 2 ). (a) What 

are the variances of £ and $? ( b ) Are these efficient? (c) Is the ML 

estimator for <r 2 unbiased? 

9.4. Given the bivariate normal distribution with parameters: /**, p yt 
o-J, cr 2 , p 9y . (a) Find the ML estimators for each parameter for random 

samples of n. ( b ) Show that (p) 2 = (d) 2 • n 2 /? 2 , where j§ is obtained from 
Exercise 9.3. (c) Is there any difference between <r 2 above and the cr 2 of 

Exercise 9.3? 
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CHAPTER 10 


INTERVAL ESTIMATION 

10.1. Introduction. In Sec. 6.1 we introduced a method of estimating 
the value of a parameter, B, which is less restrictive than that of point 
estimation. This method, called interval estimation, derives limits C\ and 
C 2 which are functions of the sample values {X;} or functions of the sample 
values and known population parameters. We recall that the interval 
(ci,c 2 ) is called the confidence interval, which is determined so that, in 
repeated sampling from the same population, the interval will contain the 
parameter B a certain percentage of the time. Symbolically c\ and c 2 are 
determined so that 

<P(ci < B < c 2 ) > 1 — a, 

that is, the probability of the interval (ci,c 2 ) containing the population 
parameter B is greater than or equal to 1 — a. The value a is a small 
positive number less than 1 . The probability holds only for a large num¬ 
ber of similarly drawn samples, where C\ and c 2 are recalculated for each 
sample. The value (1 — a) is called the confidence probability . In 
Chap. 7 methods and examples were given introducing confidence-interval 
construction in connection with the derivations of certain important 
sampling distributions. 

These concepts were introduced by Neyman . 1 R. A. Fisher uses the 
terms fiducial interval and fiducial probability to indicate substantially the 
same concepts, though he restricts his results to sufficient statistics. 

10.2. Intervals for Sufficient Estimators. If $ is a sufficient estimator 
such that 

n 

II f(X % ;0) = g(X h X 2 , . . . ,X n \6) • h(B;B), 

i = 1 

then the problem of estimating the confidence limits becomes one of find¬ 
ing the limits yi(B) and 72 ( 0 ) such that 

( T!W h(S;$) d$ = l - a. 

It follows that 

(P[7i(0) < B < 7 2 (0)] — l — a. 

We then solve the equations yi(B) — B and 72 (B) = B for B and obtain the 
solutions c 2 and c 1 , respectively. 
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Let us consider the problem graphically. In Fig. 10.1 the curves yi( 0 ) 
and 72(0) are drawn so that (P[yi(0) < $ < 72(0)] = 1 — a if/(X,-; 0 ) is 
continuous, and <P > 1 — a if f(Xv t d) is discrete. Let C\ and c 2 be the 
intersections of the straight line 6 = 0 O with the curves 72(0) and 71(0), 
respectively (note the interchange of subscripts). 0 O is the particular 
value of 0 obtained from the sample. The line segment (ci,c 2 ) will 



Pig. 10,1. Graphical representation of confidence interval. 

intersect the line 6 — do (the true value of the parameter) only if 6 falls 
between 710 and 720 . But the probability of the latter event is 

(P(yio < 6 < 720 ) — l — a. 

Hence, 1 — a is also the probability that the variable interval (ci,c 2 ) 
includes 0 O . 

We may summarize as follows: 

1 — a = (P( 7 io < 6 < 72o| 0 = 0o) = (P(cic 2 intersects 9 = 0 O ) 

= <P(ci < 0 O < c 2 ). 

This does not imply that 0 has a distribution or that on a given trial there 
is a probability that the true value 0 O lies between C\ and c 2 . What is 
meant is that, if a series of trials are made, in about 100(1 — a) per cent 
of these trials the variable interval (ci,c 2 ) will include 0 O , the true value of 0. 

10.3. Shortest Confidence Interval. It is clear that there are an 
infinity of possible limits (71,72) such that (P(7i < 0 < 72) = 1 — a. 
For example, we might take 71 = —00 so that (P (0 > 72) = a , or we 
might take (P(0 > 72 ) = (P(0 < 71 ) = a/ 2.f In determining which of 
the infinity of possible limits (71,72) to use, we shall usually wish to make 
the confidence interval as small as possible. If we consider only unbiased 
estimates which are asymptotically normally distributed, this interval 

f It will be convenient, in general, to let (P (d > y 2 ) = ck 2 and (P (0 < 7:) = a h 
where on + «2 — a. 
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can be made as small as possible by choosing the 0 with the smallest 
variance and selecting the limits ( 71 , 72 ) such that 


a 

«1 = Ci2 = 2 ' 


The ML estimator possesses this desired property. For detailed dis¬ 
cussions of this topic, see Neyman 2 and Wilks. 3 

10.4. More than One Unknown Parameter. If the frequency dis¬ 
tribution is a function of several unknown parameters {0;}, i — 1, 
2 j ... y h; there is a confidence interval for one of these parameters, say 
0i, if a function of the sample (Xi,X 2 , . . . ,X n ) and 0i, 0(Xi,X 2 , 
. . . ,X n ;0i), can be found such that 


j*M <4 = 1 - 

where <£i and <#>2 are numerical values of <£ = 0(Xi,X 2 , . . . ,X n ;0i)and0 
is independent of the other {0 t - j. In this case 

(P[0i < 0(Xi,X 2 , . . . ,X n ;0i) < 0 2 ] = 1 — a. 

By reversing the inequalities 0 < 0 2 and 0 > 0i we can find values of Ci 
and c 2 such that 

(P[ci < 0i < c 2 ] == 1 — a, 


where Ci and c 2 are functions of (X h X 2 , . . . ,X„) and also 0 2 and 0 1 , 
respectively. 

The problem of finding a function 0 which is independent of the other 
parameters is often quite difficult. The use of the usual t , used in t tests, 
to solve the Behrens-Fisher problem is an example of this. Hotelling 
introduced the term nuisance 'parameters to apply to these other parame¬ 
ters which appear in the distribution of the statistic but which we wish to 
eliminate when making statements concerning confidence limits for one of 
the parameters. 

Example 10.1. It is desired to estimate the 95 per cent confidence 
interval (a = .05) for the mean p of a N(p y (i 2 ) population by use of a ran¬ 
dom sample of n . We consider the following cases: 


n 


(a) a 2 known. The ML estimation of p is X 


£ Xi/n. 


Since X is 


i — 1 

N(p,cr 2 /n), it is evident that the shortest interval will be obtained by 
letting a\ — a 2 = .025, that is, 




dX = .025. 



INTERVAL ESTIMATION 


This is true because of the concentration of the probability, or area under 
the curve, about /i for the symmetrical normal distiibution. 

Now, the integral on the right above becomes 


e -n(X-»)W dx = .025. 


Let T = y/n (X - u )/<?, and the integral becomes 


'V»( 72~m) 


2V(0,1) dT = .025. 


The value of the lower limit of this integral may be obtained from a table 
of areas for the normal curve. It will be found that 


Vw (72 — m) 


= 1.96, 


and hence, 


Similarly, 


It follows that 


7i = M 


1.96o- . * „ , 1.96a- 

--=r < X < fX H - 

yjn y/n 


^ = .95. 


Reversing the positions of ju and X, we have 


i.96cr l.oao-l Q k 

(P X- 7 =- < ju < X H- 7 =r = .95. 

yw V n . 

Note that Ci = X - (1.96<r/\/w) corresponds to 72 = m 

and similarly for c 2 and 71 ; that is, if 

- , 1.96a- 

X < 72 = M 

y/ n 

then 


(1.96o-/V^)> 


X < 7 : 


jLi > X -7=r = Cl. 

y/n 

Unless the nuisance parameter a is known, these confidence limits are of 
little use. This result is the same as that obtained in Sec. 7.2. 

(5) a - 2 unknown. In this case we are concerned with two unknown 
parameters, ju and cr 2 . It is known that t — y/n (X — tx)/s is distributed 
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as Student’s t with (n - 1) degrees_of freedom. Hence t is a function of 
only n and the sample values n, X, and 5 . It is possible then to find 
numerical values h and t 2 such that 


<P[*i < t < t 2 ] = .95. 


Since t has a symmetrical distribution with its maximum density in the 
center, the shortest confidence interval with a = .05 will result from 
setting 

(?(t > t 2 ) = (p (t < h) = .025. 


In this case h - t 2 . Since the values of h and t 2 depend on n, we cannot 

find unique confidence limits as in (a) above. For n = 4 (3 degrees of 
freedom), h = t 2 = 3.182, and for n — 20 (19 degrees of freedom) 
ti = t 2 = 2.093. 

The reverse limits are obtained as follows: For t = \/n (X — fx)/s < t 2 

M > X - t 2 s/ V n, and similarly for t > h we obtain M < X + tis/Vn 
Hence 


(P 



< ix < 


X + hs 
\/ n 


= .95 




where .95 is the confidence probability. The confidence limits are now 
independent of the nuisance parameter a since t was used instead of T as 
in Sec. 7.10. 

Example 10.2. In Example 10.1(6), determine a 95 per cent con¬ 
fidence interval for <r 2 . We know that x 2 = (n - l)s 2 /^ 2 is distributed 
as chi-square with (n - 1) degrees of freedom. Let x 2 and x! be values 
of x 2 such that 

£7(x 2 ) = - 95 - 

Then 

(P[x? < x 2 < xf] = .95, 

and for x l = (» - 1> 2 A 2 < X J, <r« > (n - l)«»/ x ». Similarly, for x 2 > 
Xi, <r 2 < (n — l)s 2 /x;. Hence, the confidence interval for a~ is 

xl X 2 

As for Example 10.1(b), the values of x ? and xl depend on n (see Exercise 
7 • 13). 

In this case the problem of selecting values of x \ and x 2 in order to 
obtain the shortest confidence interval is more complicated because of the 
skewness of the x 2 distribution. It is clear that the interval is propor¬ 
tional to 
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1 __L 

xi xi 


Let us illustrate the difficulty in selecting values of x? andI xl iin order to 
obtain the shortest confidence interval for the case n = 3 (2 degrees of 
freedom). We have 


( 


X* 2 


-X 2 /2 d 


(A = 




(a) If we select xl and xi so that «, = «.> -025, then xl = 7.38, and 
xl = .05066. Hence 

L - JL = 19.8 - 0.1 = 19.7, 

Xl X* 


and the confidence interval is 0.2s 2 < (r 2 < 39.6s 2 2 _ 

(b) If we select x? and xl s0 ttiat ai = - 05 an(i “ 2 “ ttien %2 
and Xi = .1026. Hence 


The confidence interval now becomes 

0 < a 2 < 19.4s 2 , 

which is much shorter than that obtained in (a) above.. 

(c) By minimizing 1/x! “ 1/xi subject to the relation 

f x [ /(x 2 ) ^(x 2 ) = 1 - 

Jx i 2 

it is possible to show that the values x? and xl should satisfy the following 
relationship in order to provide the shortest confidence interval for a . 

xi/XxD = x!/(x 1). 


EXERCISES 


10 1. Given a sample of n from a binomial population with constant 
probability p. If we let « x = == -025 and assume that the sample 

estimate of p, 3, is approximately N(p, p(l - p)M, show that the con¬ 
fidence interval is (p x < p < p 2 ), where p x and p 2 are solutions of the 
quadratic equation 


n(3 - V) 2 = (1.96) 2 (p - P 2 ). 

10.2. Find the shortest confidence interval for the difference between 
the means of two normal populations with the same variance, <v, with a 
sample of mfrom the first population and of n 2 from the second population. 
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o, Z " Men " ,or th * '* tio — 

* = <y\/v\ 

is given by 

, F 0 /F 2 < e < F 0 /F h 

where 

fr'm dF =1 - a 

tiSly,” = 4/4 ^ (Wl “ 1} and (nt ~ 1} degree * 0f freed °m, respec- 

and F^ 611 ^ = 12 (J and W2 “ 25 and “ = - 10 , determine values of F, 
and F 2 if «x = Suppose «} = 20 and «> = 10, what are the 90 per 
cent confidence limits? pKI 

limhfte tlTll^ “ f Xercise J- 35 t0 »* »P 90 per cent confidence 
limits for the following ratios, used in statistical genetics- 

(a) °T±M. 

(b) 4 

(c) 


°1 


“■f 


<rf +<r| 

10.5. (For students who have studied advanced calculus.) Derive 
condition c of Example 10.2. derive 

Extrcte DScr diti0n ^ ^ Sh ° rteSt °° nMeace interval ** 

10.7. (a) Derive the general confidence limits for a linear function of 
the sample values, that is, 


t = 2, <w.-, 

where the X’b are NID ( M ,<r 2 ). ’"" 

(b) Apply your result to h and h in Exercise ^ assuming f = 4 
T ° o0 > Tl — 105 > T 2 = 95, and T z = 70, and s 2 = 12. ’ 
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CHAPTER 11 

TESTS OF HYPOTHESES 


11.1. Introduction. Statistical inference concerns itself in general 
with two types of problems: estimation of population parameters and tests 
of hypotheses . In the preceding three chapters, we have considered prob¬ 
lems of estimation. Desirable properties of a “good” estimator were 
discussed; and a principle of estimation, the maximum-likelihood method, 
was presented as a technique which in many cases may be easily used to 
obtain estimators possessing many of these desirable properties. 

In this chapter, we propose to discuss the general problem of tests of 
hypotheses and present a principle, the likelihood-ratio criterion, which in 
many cases will provide a “good” test criterion to be used in testing hypo¬ 
theses concerning population parameters. In discussing derived sam¬ 
pling distributions in Chap. 7, it will be recalled that the distributions of 
X 2 , t, and F were obtained and their uses in applied statistics in testing 
hypotheses were pointed out and illustrated. In this discussion we wish 
to investigate the theoretical justification for selecting a particular test 
criterion for a particular problem in hand. 

11.2. The General Problem. In order to test whether a given hypothe¬ 
sis (Ho, the null hypothesis) is supported by a given set of data, we must 
devise a rule of procedure, depending on the outcome of calculations 
obtained from the sample, to decide whether to reject or not to reject Ho . 
For example, in testing whether or not a given sample supports the hypo¬ 
thesis that the observations were randomly selected from N( 0,1) we 
calculate T = X and consider it a normal deviate with unit variance. 
After a choice of an allowable error or probability of rejecting H 0 when 
it is true, say a, we find two regions such that, if X is in one region, we 
reject H 0 and, if in the other region, we do not reject H 0 . The first 
region will be called the region of rejection, R, and is defined so that the 
probability of the sample falling in R is a, if H 0 is true. Designate T the 
test criterion and a the significance level. 

As indicated in Sec. 11.1 there are many different criteria for judging 
the truth of a given hypothesis. For example, if we wish to test whether 
or not a given sample could have been drawn from NID(0,1), we could 
use any one of the following tests (and probably many more): 

(i) T = X, a normal deviate with unit variance. 
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(ii) t = y/n X/s, Student’s test. 

(iii) x 2 — (n — l)s 2 /l, the x 2 test for the agreement of sample and 
population variance. 

(iv) Tests for skewness or kurtosis in the distribution (evidence of 
nonnormality). 

(v) Tests for serial correlation in the observations. 

The likelihood-ratio criterion, mentioned earlier, may be used to indicate 
which of the possible test criteria to use. Actually the likelihood ratio 
defines a region of rejection R, which involves computing some test 
criterion such as one of those mentioned above. 

The observations in the sample (X h X 2 , . . . ,X n ) may be thought of 
as representing the coordinates of a point, S, in 71 -dimensional space. 
The space is divided into two regions—the region of rejection R and the 
region of nonrejection. If S falls in R, we shall reject H 0 ; otherwise accept 
it. The region R corresponds to the region outside the confidence interval 
discussed in the previous chapter and is defined so that the probability of 
rejecting a true hypothesis (the probability of S falling in R when H 0 is 
true) is the significance level a (for example, a = .05 or .01). This will 
be indicated symbolically as 

<P(S G R\H 0 ) = a, 

where S G R means that the sample point A is contained in R. 

As with confidence intervals, there may be a large number of possible 
regions R which satisfy this probability statement. For purposes of 
making tests of significance, it seems reasonable to select that R for 
which (P (S G R) is maximized if the true hypothesis is not H 0 . That is, 
we want to reject the null hypothesis as often as possible when it is 
not true. Hence, we are led to consider the possible alternatives to H 0 . 
Designate all alternatives as H a . Symbolically we wish to maximize 
(P($ G R\H a ) for a fixed (P(S G R\Ho) — a. In future discussions we shall 
let <P(S G R\H a ) = Pa- The quantity fi a is called the power of the test, 
since it measures how powerful the test is in indicating a true difference 
from H 0 when such a difference actually exists. In most cases there will 
be an infinity of possible alternatives H a , and (3 a will be different for each 
H a . Hence, in general, it will not be possible to maximize the power for 
all alternative hypotheses. However, in certain cases, such a test can 
be found and is called a uniformly most powerful test. It can be shown 
that the likelihood ratio criterion will produce a uniformly most powerful 
test if one exists. 

An unbiased test is one for which the power is a minimum for H = H 0 ; 
that is, we reject H a the least number of times when H 0 is the true hypo¬ 
thesis. In case there is no over-all uniformly most powerful test, it would 
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appear that we should at least choose an unbiased test and if possible, 
select the uniformly most powerful test from among the unbiased ones. 

The theory of tests of hypotheses under discussion was introduced by 
Neyman and Pearson.' ^ Test s of significance are considered from the 
point of view of errors of the first and second kind. In making a test of 
H o two types of errors may be committed: (I) we may reject H «when it is 
true; (II) we may accept H 0 when it is false. It follows that 

(P(I) = a, <P(H) = 1 ~ Pa- 

Maximizing the power corresponds to minimizing the probability of com¬ 
mitting a Type II error for a fixed probability of committing a Type I 

error. 

The importance of taking into account the Type II error or the power 
of a test may be illustrated as follows: It is desired to determine whether 
or not a new variety of corn yields more than some accepted variety. 
Suppose account is taken of a Type I error only and that the significance 
probability is fixed at « = .05, which guarantees that, if there is no real 
difference between the new variety and the standard, significance shall be 
indicated only 5 per cent of the time. In this case it would not be neces¬ 
sary to perform an experiment at all. It would suffice to draw a bead at 
random from a bowl whose composition is 19 white and 1 red. it it is 
white we accept Ho (no difference between the varieties); if red, we reject 
H 0 . Using the procedure, we shall always reject H, 5 per cent of the 
time and this will be done whether H 0 is true or not. The defect m this 
procedure is now apparent. If there is a real difference, we shall recog¬ 
nize this only 5 per cent of the time, which also, in this case, is the power 
of the test It is now clear that we need a different testing procedure m 
order that the probability of detecting true differences when they exist 
may be large. The t test may be shown to maximize the power oi the 
test under certain conditions. This will be discussed later. 

Cramer 4 has summarized the problem of selecting a good test 
criterion as follows: “In order that a test of the hypothesis Ho should be 
judged to be good, we should accordingly require that the test has a 
small probability of rejecting Ho when this hypothesis is true, but a large 
probability of rejecting Ho when it is false. Of two tests corresponding 
to a probability a of rejecting H 0 when it is true, we should thus prefer 
. the one that gives the largest probability of rejecting Ho when it isi ta se. 

11 3 Types of Hypotheses. A hypothesis, F 0 , which specifies the 
values of all parameters in the population distribution is called a simple 
hypothesis; in other words, a simple hypothesis specifies that the distribu¬ 
tion is one specific member of a family of distributions. If H „ does not 
specify the values of all population parameters, it is called a composite 
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hypothesis. The hypothesis H o must be taken from a set of admissible 
hypotheses, which usually depend upon the form of the distribution. 
For the normal parent distribution, JV(m,o- 2 ), ft is - <*> < fx < oo 
0 < (t 2 < oo. A composite hypothesis then states that a distribution 
belongs to some subspace of the parameter space. 

11.4. Simple Hypothesis for a One-parameter Distribution. In order 
to illustrate the method of determining the power curve for a test, con¬ 
sider the problem of testing some hypothesis concerning the value of the 
mean, m, of a normal population with unit variance, N(m, 1). The sample 
observations may be n differences obtained from n pairs of observations 
such as in the corn-variety problem mentioned earlier. In this case the 
population of differences would be assumed to have unit variance. Set 
up the null hypothesis H 0 : m = mo. The set of admissible hypotheses is 
-oo < m < GO, 0 - 2 = 1 . The sample mean of the differences, X, is to be 
used to test H 0 against some alternative H a . 

From the discussion on confidence intervals, we know that the prob¬ 
ability of committing a Type I error is given by 


W = 1 ly- nCX - Mm dX = a, 

where 72 = mo + TV s/n and 71 = Mo + T 1 / \/n. It will be shown later 
that, for this particular problem, X is the “best” test criterion. If 
a = .05, the shortest confidence interval was shown to result when 
T 2 = -T 7 , = 1.96. 

The integration is simplified by setting T 0 = yn (X - Mo ) so that 


= 1- A= f 

V2lr J T ‘ 




e-TV/2 dT 


Hence, if the calculated X is greater than y 2 or less than 71 or, what 
amounts to the same thing, if the value of T 0 for a particular problem is 
greater than T 2 or less than T 1 , then we say that we have evidence that 
Ho is false. 

In order to evaluate the power of this test to detect real differences, 
that is, g = fx a ^ go, we obtain the power function. The power function 
will give the probability of rejecting H 0 when m = m« ^ Mo. The func¬ 
tion should increase as the difference between fx a and m 0 increases, since in 
such cases it would be increasingly desirable to reject H 0 . The problem 
then is to determine the probability / 3 a of obtaining X > T 2 or X < T x 
from N{(x a , 1) for fixed a, Note that the power, (3 a? equals a for jx a = mo* 
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Now, 


/3a = 1 


= 1 


I n T 72 

\ 2 t J 71 

CT 
JT i 


g—fta) 2 /2 


-\Z2tt 


Tt p—[T*-y/ n (Ma—M o)3 2 /2 ( ^J , ( 


= _L= f Tu e~ T “‘ >n dTa + -4= f T l r T °' n dT °> 

s/Zk " V 2ir 

where T. = (* - M-) = ” V" (m “ - '* o) ’ 

7V = Ti- Vn (jia ~ Mo), 

Tla — 7* i — \/W (l^a Mo)- _ 

To illustrate the computations, let us compute ft, for » = 25 “ “ •« 
nTld m r _ _ oo T,= 1.645 and (ii) T t = - Ti = 1.96. We see that 

the value of the sum of the two integrals giving ft. may be obtained from 
Table lb in the Appendix for given (m„ - Mo) as set out m Table . , 

Table 11.1 


(i) 


5(jUa — Ho) 


-3.0 

- 2.0 

- 1.0 

.0 

1.0 

2.0 

3.0 

4.0 


Pa 


4.645 

3.645 

2.645 

1.645 
.645 

- .355 
-1.355 
-2.355 


.0000 

.0001 

.0041 

.0500 

.2594 

.6387 

.9123 

.9907 


(ii) 


Pa 


4.960 

3.960 

2.960 

1.960 
.960 

- .040 
-1.040 
-2.040 


1.040 
.040 
- .960 
-1.960 
-2.960 
-3.960 
-4.960 
-5.960 


.8508 
.5160 
.1700 
.0500 
. 1700 
.5160 
.8508 
.9793 


It might be helpful if these results were illustrated graphically as 
in Fig 11 1. Let us consider the difference between the Power for 
5( u _ „ 0 ) = d = 0 and the power for d = 1. The small shaded area 
is the rejection region for d = 0, with a - .05. The upper shaded area 
is the added amount to the rejection area for d = 1- The su “ 0 6 

shaded areas gives the power (ft.) for rejecting the hypothesis 
when is actually m« + -2, since 1 = 50*. -Mob ^ this case 0. - *>94. 

Graphs of various power functions for the above example dependmg 
on the values of 7h and r„ with fixed (P(I) = «, and on the position of the 
minimum point are set out in Fig. 11.2. 

(i) = - oo ; 7h = 1.645. 

(ii) Minimum at — Mcj Ti = 1.96, Ti = —1.96. 

(iii) T 2 = oo; Ti = -1.645. 
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(iv) Minimum at < no, for example, T 2 = 1.75, Ti = —2 33 

(v) Minimum at p a > y„, for example, T, = 2.33, Ti = —1.75 
From the definition of ft,, we see that the minimum point is reached when 

T 2 - s/n (jia - no) = ~[Ti — s/n ( Mo - Mo )] 


Ti + T 2 = 2 s/n (n„ - Mo ). 

This relation is found by differentiating p a with respect to and setting 
the derivative equal to zero. It follows that if = no, the minimum 
point is at 7 i = — T 2 . 


f(T a ) 




It ii inrtructive to examine these power curves from the standpoint of 
the following types of alternative hypotheses: 

. W - 1 / Ha asse . rts ^ P° wer cur ve (i) is uniformly most powerful 
since its power is greater than that of any other curve for all such h’ 
In this case we are willing to accept H„: n - Mo even though the true 

ypothesis is /t < n 0 . Hence, the region of rejection is T > 1 645 for 
a = .05. 
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(iii) Similarly power curve (iii) is uniformly most powerfulJo,• testing 
the hypothesis m = mo against the alternatives m« < Mo. The region 

re ® fo ; testing , H ° *7$^ 

,. rr . ^ This result is evident from a study of (1) and 

Sr Each of these is uniformly most powerful on opposite sides of 
= Jbut has practically no power on the other side. Nootta sing 
test can be as powerful as (i) on the right or (in) on the left. F 
H a : na * mo we must adopt some compromise rejection region. Neyma 
and Pearson have suggested using an unbiased teat, w ie m ^ 
would lead to the use of power curve (ii). For this curve Ti = gV H 
should be noted that curves (iv) and (v) are more powerful than (u) 
one tail but give a power less than « for some alternatives It shouldl be 
emphasized that the Type I error (probability of rejecting Ho when 
u = u „) is constant for all of these power curves. 

Ill Composite Hypotheses. The theory of tests of composite 
hypotheses has not been completed. Borne methods wi e 1 us r «| e 

b, „ of tho I . g™ * “M' %z .th,; the 

fo Wt fke«’ 0 ^.dmidbfehypoth- 

<'•• < .. Th, „„U h„o.ho». » ««»- 

posite since it does not specify the value of 

From our study of confidence intervals, it seems reasonable to use 

_ „ < x < hs/Vn as the acceptance region, where 


f 00 f(t) 

Jt2 


a. 


The rejection region is * - 1 Yn/s > t, (« H. was m <».« should 

use the acceptance region t > h = -hor relectl ™ re ^° { ^ wag 
of these yield uniformly-most-powerful tests however, rf 
M ^ 0, no uniformly-most-powerful test is availabl . , 

light make use of the unbiased test with acceptance region h < « < < > 
wh B er e t\ = -t; and ax = a, - «/2.) A more rigorous treatment of this 

-posite test is, in general, 

quite complicated owing to the nonspecificity of certain of Pa^meters 

by the null hypothesis. In using the t test, Ho does not specify the value 
oLt hence, we must make use of * the estiinate og ^ 

We must determine the probability that t X Yn/s 2, 
has been drawn from a population with mean M * 0. Now, wnen 
^we Sect H„; and If m is actually greater than zero, a correct 
decision has been made. The probability of making this correct decision 
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is called the power of the test for a given M (+0) and «. It will be recalled 
that a is the probability of stating n ^ 0 when it actually is zero 

To evaluate the power of the t test, we recall the joint distribution of 
X and s 2 from Chap. 7. 


where 


f(X, s t) =/,{!) •/,(,.) 


/i(X) = I* A e-noi-ww 

\ 2tt <t 


M' 


r 2r ( n A V 2 


. 2 \ (»-3)/2 


e —(n~ l)sV2er2 


Now, the power of the t test to detect a true mean, „ = ^ i s given by 
(Pit = X V»/» > 4|/i - #!<,, (P(I) = a] = where 

<P(I) = <P(< > < 2 | m = 0) = a. 

It will be recalled that the t distribution was obtained from that of X and 

;.»oS.LS W ^ ” “r « .h. 


S. - / «.')*>') / J±.l 

Jo , J h /Vn^2ir <r e dX > 

thea^3 ^ ^ Understood tllat we first determine < 2 so that <P(I) = a and 
Let y a V n/a = t a ( known value) and X vV<r = 4 + 2/; then 

A = / 0 V,(«W) C (sltM _ u m,l)dy. 

But, «V<r 2 = x 2 /(n - 1), and hence 

& = /o" /(X 2 ) d(x 2 ) ll (xl , /V ~^ N(0,1) dy. 

. , Th ® ® valuatl0n of ft« m «st be accompl ished by some form of numerical 
integration over the region y > (x4/V» - 1) - 4 and x 2 > 0. If we 


#(0,1) dy. 


■ - /;'/(»> 
£.-«*■> 
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Since p = 0 when x 2 = 0 and p = 1 when x 2 = M ,we may change to a 
(p, 2 /) system where 0 < p < 1. We may now compute the power of the 
test to detect a given value of t a ^ 0 for a fixed h and n . In order to 
compute the power for t = t a , we proceed as follows: , 

(i) Set down successive values of p. 

(ii) Ascertain the values of x corresponding to each p . 

(iii) Compute the value of y 2 for each x- 

(iv) Determine the area, P, under the normal curve between y 2 and <*>. 
If we plot the values of P as ordinates with the corresponding p values 

as the abscissas, then (3 a is given by the area under this curve as illustrated 
in Fig. 11.3. This area may be computed by some method of numerical 
integration, such as the trapezoidal rule or Simpson’s rule. 


P 



Neyman and Tokarska 6 have published values of t a for = -99, -95, 
.90(.10).10. Using the procedure outlined above, let us calculate 0 O for 
t a = 1.15, which is the value of t given in the Neyman and Tokarska 
tables, corresponding to a — .05, — -20, and n\ = 3. If a = .05, 

then t 2 = 2.920 and y% = 2.065x - U15. We now obtain the entries in 
Table 11.2. 

Using the trapezoidal rule, we obtain /3 a = .2017 as compared with the 
actual value of .20 mentioned earlier. 

11.6. Use of Power-function Tables in Planning Experiments. In 
case the experimenter has in advance some knowledge of the size of the 
coefficient of variation, that is, the standard deviation of any observation 
expressed as a per cent of the general mean, it will then be possible to 
make use of the tables of Neyman and Tokarska in the planning of 
experiments. 

Example 11.1. (Due to Neyman and Tokarska.) A plant breeder 
wishes to compare a new variety, V i, with an established standard Uo. 

f @ a may be found from the tables of Neyman and Tokarska by the relationship, 
0 a = 1 - p u . The n of the tables is the degrees of freedom. 
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Let in and mo denote the true mean yields of Vi and 7 0 per some unit of 
area, respectively. The hypothesis to be tested is H 0 : mi < mo, and the 
alternative hypothesis is H a : mi > mo- In other words, the plant breeder 
will consider his problem of producing a better variety as successfully 
accomplished whenever he obtains evidence that H 0 is not true and there¬ 
fore that mi > Mo- It is desired to reduce the probability of an unjust 
rejection of Ho to a = .01. In a completely randomized experiment each 
variety is repeated n' = 8 times, and hence the pooled experimental-error 

Table 11.2 


V 

X 

M 2 

P 

.90 

2.146 

3.282 

.00052 

.80 

1.794 

2.555 

.00531 

.70 

1.552 

2.054 

.01996 

.60 

1.354 

1.645 

.04994 

.50 

1.177 

1.281 

.10004 

.40 

1.011 

. 9373 

.17430 

.30 

.8446 

.5941 

.27622 

.25 

.7585 

.4163 

.33860 

.20 

.6680 

.2295 

.40924 

.15 

.5701 

.0273 

.48911 

.10 

.4590 

- .2021 

.58008 

.075 

.3949 

- .3346 

.63104 

.05 

.3203 

- .4886 

.68744 

.025 

.2250 

- .6853 

.75342 

.020 

.2010 

- .7349 

.76880 

.015 

.1739 

- .7910 

.78553 

.010 

.1418 

- .8572 

.80433 

.005 

.1001 

- .9433 

.82724 

.0025 

.07075 

- 1.0039 

.84229 

.00 

.00 

- 1.1500 

.87493 


degrees of freedom is 14. According to previous experience the standard 
deviation of any single yield is expected to be cr 0 = 6 per cent of the 
general mean yield. The experimenter now wishes to know the size of 
differences between the mean yields of varieties Vo and Vi (in favor of 
the new variety V i) which he is likely to detect in his experiment in case 
they in fact exist. 

Now, in order to use Table II from Neyman and Tokarska, we find the 
standard deviation of the difference of the two means as 

a = Co y/2/n t = 6 Vi = 3 per cent of the general mean. 

But A — pa — Sp per cent of the general mean, where A = mi ~ Mo and 
p = ( /Xl — mo ) /a by definition. Then, entering Table II opposite n — 14 
degrees of freedom, we multiply the tabled values of p by 3 to obtain the 
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entries in Table 11.3. The first pair of entries means that if the true 
difference in mean yield of Vi over F 0 is as large as 15.54 per cent of the 
general mean yield, then the experiment described will detect this differ¬ 
ence in 99 per cent of the cases. From the table it may be seen that a 
reasonable probability of detection such as .90 or .80 corresponds to true 
differences in yields exceeding 10 per cent of the general mean and that 
differences under 5 per cent have a probability of only .20 of being 
detected. The experimenter now possesses information enabling him to 
judge the adequacy of the proposed experiment. The experiment would 
be judged satisfactory if it is desired to discover differences over 10 per 
cent. On the other hand, if the process of improving the particular 
varieties is well advanced, a difference as large as 5 per cent may be as 
large as could be expected. In the latter case the proposed experiment 
is not satisfactory, and some modification is in order. Increased pie- 
cision may be obtained by (i) increasing the number of repetitions and 
thereby increasing the degrees of freedom, (ii) improving the experimental 
techniques and thereby decreasing the standard deviation of any single 
plot yield, or (iii) increasing the size of a. 

Table 11.3 

Level of Significance a = 0.01 

Size of real differences in 
average yields in percent¬ 
ages of the acreage yield 


Probability of detection 

Example 11.2. Tang 6 has obtained the functional form of the power 
function of the analysis-of-variance tests, and tables with illustrations 
of their uses. While ihe derivations are beyond the scope of this text, 
it is instructive to consider one of Tang’s examples illustrating the use of 
his tables in planning experiments. A randomized-blocks experiment is 
planned to compare four treatments (fc = 4), replicated five (n = 5) 
times. Let 6, be the difference between the true ith-treatment effect.and 

k 

the true general mean, so that ^ 5* = 0. Suppose that for the experi- 

i — l 

ment the 6 t have values -5,-4, 3, 6, expressed as percentages of the 
mean yield per plot. Further, suppose from past experience that the true 
standard deviation per plot, <r, is 10 per cent of the general mean. In 
order to enter Tang’s tables we calculate 

% 


15.54 

13.26 

12.03 

10.53 9.48 6.87 5.97 4.92 3.45 

.99 

.95 

.90 

.80 .70 .40 .30 .20 .10 
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Entering Table II from Tang, with degrees of freedom /i = 3 , f 2 = 12, 
and cf> = 1.04, we find P n = .7 roughly. This means that true treatment 
differences, such as those given above, would be significant at the 5 per 
cent level in about 3 experiments out of 10 only. 

In practice the true treatment differences are not known, but use may 
be made of the fact that if <£ were as large as some specified value <f> 0} say, 
the probability P n of failing to detect the existence of treatment differ¬ 
ences may be obtained from Tang’s tables. 

In a second example Tang considers a randomized-blocks experiment 
with k = 6 treatments and n - 7 blocks. Then f x = 5, f 2 = 30, and 
Table II, appropriate when using the 5 per cent significance level, shows 
Pn - .262 for 0 = 1 . 5 . In this case we would fail to detect the presence 
of treatment differences in about one in four times when <f> is as large as 
1.5 or when 

51 = ^7 X L5 = °- 567ff - 

Assuming the stand ard dev iation of a plot to be about 10 per cent of the 

mean yield, then $i = 5.67 per cent of the mean yield per plot. 

Now, there will be an unlimited number of sets of six values of ^ whose 
sum will be zero and having 5.67 as standard deviation. In order to 
obtain upper and lower bounds for at least one value of the 8 { , we con- 
sider the two extreme sets ’ 

(a) <5i = $2 = $3 = (b — 5$ = —56/5, 

(b) = c>2 = 5 3 = — §4 = — c>5 — — 5 6 . 

For (a) we find 

5 6 = V* - 1 ^ 8 ? = 12.68 

i 

and for ( b ) 

56 = 5 * 2 = 5 - 67 - 

i 

It may be proved, for this example, that there must be at least one 8, say 
8 6 , whose value lies between 12.68 and 5.67. 

11.7. The Likelihood-ratio Criterion. In Sec. 9.2 the method of 
maximum likelihood was presented as a general method, involving routine 
mathematical procedures, for obtaining an estimator of a population 
parameter possessing many desirable properties. In an analogous man¬ 
ner the likelihood-ratio criterion will now be presented as a general method, 
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involving routine mathematical procedures, for obtaining a “good” test 

“ThTprocedure for obtaining a likelihood-ratio criterion to be used in 
testing the hypothesis that • • • A) belongs to the subspace co 

of the entire parameter space Q on the basis of the random sample {X;l 
drawn from the population of (X;fLA • • • A) is set out beloW ‘ 

Let the likelihood of the sample be 

71 

p , = n /(*<aa • • • a)- 

; = 1 

This likelihood will usually have a maximum as the parameters vary over 
the entire parameter space Q. Denote this maximum value y 

P.ihh ■ • • A) 

or briefly as P,(S). Similarly, P, will usually have a maximum value m a 
which shall be denoted as P.(A) . Then the likelihood-ratio criterion for 
the hypothesis to be tested is 


X = 


Ps(ft) 


The estimators k of the population parameters 0», which are obtained as 
quantities to be substituted in P, determining P .(A) and P.(Q), are 
derived by the method of maximum likelihood. . It follows that X is a 
function of the sample observations only, that is, it does not involve any 
population parameters. 

Since P s is positive as a result of being the product of density functions 
and P/&) is less than or at most equal to P,(6) because we are more 
restricted in maximizing P, in « than in 0, X will be a positive fraction. 

Its range will be from 0 to 1. . . . 

In order to use X as a test criterion in applied statistics, it is necessary 
that we obtain the sampling distribution of X on the assumption that the 
hypothesis being tested is true. We note that X will be small, if P.(«) » 
smaller than P,(8). We shall wish to reject the hypothesis to be tested 

in case X is small. ,, , ,, 

We now find a X. such that <P(X < X„) = a on the assumption that the 

hypothesis to be tested is true. If the calculated value X„ is less than or 
equal to X„, that is, if X 0 < X„, we reject the hypothesis; otherwise we 
accept it. It should be noted that any monotonic function of X may be 
used in place of X as the test criterion. 

As indicated earlier, a “good” test criterion is one which determines a 
region which maximizes the power of detecting true deviations from the 
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null hypothesis for a given probability of committing a Type I error. In 
general it will not be possible to find a region of rejection which will 
maximize the power for all alternatives to the null hypothesis. However 
tor the simple case of only one alternative, H h and when both H„ and H, 
are simple hypotheses, it can be shown that the likelihood-ratio test 
defines a best critical region. In this case the whole parameter space $2 
contains only two points. If the alternative hypotheses H a specify the 
entire parameter space £2 other than u to be a range of values on a line 
then it is possible to choose a best critical region for each H If this 

region is the same for each H a , then the test is said to be uniformly most 
powerful. J 

Example 11.3. Given a random sample of n from A^l). The null 
hypothesis to be tested is H 0 : M = /*„, which states that co is a point while 

is the whole axis. The likelihood of the sample is 


or 


' - (t7s)‘ 

(l)' 


e ~h s ( x-x) 2 - (n/2 ) (x—fj.) 2 


tobe the ML eStimat ° r f ° r M is X’ we find the maximum value of P, in £2 


Also, 




P s (co) = C-2 2(X-X)2-(n/2)(X-jUo)2^ 

The likelihood ratio becomes 


X = e~(n / 2)(X-^ 0 ) 2 < 

ll * 1S n C !° Se t( ! Mo . in value > then the sample is reasonably consistent with 
the null hypothesis H 0 and X will be close to 1 in value. Conversely the 
sample will not be reasonably consistent with H 0 , and X will ordinarily be 
close to zero. J 

Now, suppose for the above example, or in general, the distribution of 
X when H„ is true is g(\) and <P(I) = «; then X a is determined so that 

J 0 “ dX = a. 

If the calculated X, say X 0 , be less than X«, we would reject H 0 , and vice 
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From the discussion in the above paragraph it follows that the likeli¬ 
hood-ratio method as described may not always lead to a unique test. 
If H„ is a simple hypothesis, a unique distribution of X may be obtained. 
On the other hand, if H 0 is a composite hypothesis, it will not m genera 
possible to obtain a unique distribution for X and hence no unique test 
P Example 11.4. Given a sample of n from N {n,o ). The null hypo 
sis to be tested is H 0 : n = 0, <r 2 is unspecified. The entire parameter 


0 

/*-0 

Fig. 11.4. 




space is the half plane of Fig. 11.4. The subspace specified by H 0 is the 
vertical line m = 0. The likelihood of the sample is 


- tvs) V ‘“" 


-ft) Vo-2 


The values of „ and a 2 which maximize P s in 0 have already been found 
to be 

jtt = (1 /n)XX - X, 

? = (l/n)2(X - X) 2 . 


Hence* 

Also* 

Hence 


P.(fi) 


S(X - X) 


p.(&) 


-[ 

- [I «■] 


-n /2 


-«/ 2 


-»/ 2 


r s(x - x) 2 i n/2 

X ~ I SX 2 . 


Now, we know that if Ho is true 




nX 2 = n(n - 1)X ! 

s 2 sCX^X ) 5 
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Then, since 2X 2 = 2(X - X) 2 + nX, 2 

X 2/ ” = ~ A") 2 1 1 


2(X-X) 2 + nX 2 1+ n x 2 

+ 2(X - X) 2 


1 + t 2 /(n - 1) 


In this case then, the likelihood-ratio test becomes the t test since t is a 
monotonic function of X. Then 

( P(X < x„) = 9 {\ t \ > Q = 

The region for rejection for t is \t\ > t a , and we see that large values of t 
correspond to small values of X. 


- - — -- ^ /*(j. Consider values of 

±4, ±3, ±2, ±1,0, given the conditions of the example of 


EXERCISES 

11.1. Construct the power curve for H a : (x a < Mo . 

- Mo) ' ' ' ~ ‘ 

Sec. 11.4. 

11.2. Discuss rejection regions for testing the hypothesis that the differ- 
fZ Sr T P°P ulation means of two variates, each, respectively, 

Ind H * = w ’ 18 n r °- • Take Ms < =0,0 < ff 2 < oo, 

ana tL 0 . mi - /* 2 . Hint: Consider the example of Sec. 11.5. 

p f'r\ Se y up J; he admi f ible h yP° the ses 0 and the region of rejection, 
R, for testing the hypothesis that the population variance of NU a 2 ) is 
v 0 , based on a random sample of size n. What can be said regarding the 
powder of the test for various alternatives? S 

D etermme Q and the region of rejection, R, for testing the hypoth- 
esis that the variances of two normal populations are equal M = <r 2 ) 
against the alternative hypothesis that ,? = a, 2 (a > 1). Since the test 

criterion is F, show that F„ = F,/a and that ft, = p x Oil, ±0 where 
/ \ \ 2 2 / 

V ^ 2 / F ,Ul ~ (df)i,n 2 —■ (d/) 2 . Complete the following power 

table for n i = 2, n 2 = 10: 


a 

F a 

X 




.3466 

.005 



.3981 

.010 



.4782 

.025 



.5493 

.050 



.6310 

.100 



.7579 

.250 

_ 


.8706 

.500 
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11.5. Derive the X criterion for testing the null hypothesis H 0 :m — m 0 , 

given 1 random sample of size n from the Poisson distribution ) 

f(X) = e~ m (m x /X !), 

O < X < oo ? m>0. The parameter space 0 is the half line m > 0. 

11.6. Given two random samples of sizes n\ and n 2 from each of two 
populations N(fl,<r{) and N(0,<jf). Derive the X criterion for testing the 
null hypothesis H«\e\ = <r 2 2 = <r 2 (unspecified). The entire parameter 
space 0 is the quarter plane determined by af > 0, <r 2 > 0. The subspace 
co is the line = <r 1 = u 2 (unspecified). Hint: First determine the joint 
distribution of s{ and s 2 , the sample estimates of <r 2 and oi respectively. 
Show that the criterion reduces to Snedecor’s F. 

11.7. Repeat Exercise 11.6 when the samples are from the populations 

N(fxi,af) and A r (M 2 ,o'i). 

11.8. Given a random sample of size n from N{fx,<r 2 ). bhow that the a 
criterion for testing the null hypothesis H 0 : a 2 = a 2 ,» unspecified, reduces 
to x 2 - The entire parameter space is determined so that both /z and <r 2 

are unspecified. . 

11.9. Repeat Example 11.4 when the null hypothesis to be tested is 

Hq\ IX — Mo. , . . - 

11.10. Repeat Example 11.3 when the random sample n is irom 

N(i*,a 2 ) and (r 2 is unspecified by H„. 

11.11. Given two random samples of sizes n\ and n 2 from the norma 
populations N(y. h a 2 ) and VW 2 )- Find the X criterion for testing the 
null hypothesis H 0 : mi = m = M, <r 2 unspecified. The entire parameter 
space is three-dimensional, with coordinates (mi,M 2,o' 2 ). The subspace w 
is two-dimensional, with coordinates Gz,c 2 )- 
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CHAPTER 12 

SPECIAL USES OF CHI-SQUARE 
12.1. Intr.dwcti.n 

’thlt’Tthe propied'toW criterion follows fc ohraqnare dielnbufoon 

In Chap. 7 it was shown that (n - l)s 2 /a 2 , where 
S 2 = 2(X - XY/(n - 1) 

Ls evaluated for a random sample {X<} of n from N(» f), is ^distributed * S 
2 - th c„ _ i) degrees of freedom. Methods were described and exam- 
nlelgiven in Chaps. 10 and 11 for using the chi-square distribution to set 
confidence limits for <r 2 and to test a hypothesis concerning specified values 

° f It is proposed to discuss in this chapter the theory behind other of the 
important exact and approximate uses of chi-square m making tests of 

X, X t observations in the respective classes. The expected 

Sues' m’the corresponding classes are specified a prion to be m„ 

. . . ,m„ and £ x < = T ** = »• Then P< = m </ n is the probability 

of an observation falling ’in the ith class. We wish to test the hypothesis 
that the sample distribution in the dasses might have come from a popu- 

1 1riitpt e 5Tfi 1 t C was shlwnThat if an event could take any one of k 
values y h y* ... ,V* with respective probabilities 


Vh V 2> • 


\ 


the probability that out of » independent trials Vl would appear X, times, 
would appear X 2 times, . . . , V* would appear X, times (£ X, = «) 
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n\ rr 

—n pf { . 

n Xi ! i “ i 


Now, this may be written as 


/{X,.} = n { ^! ^ 


'i. ^=i Xt e s t-i npi ~ e ~^L^ n ^Li x ^ 


fl ! 

If we identify this last expression with 

<?{A\B) = Ml, 

we see that/fX,-} is the probability of obtaining a set of k independent 
oisson variates, subject to the condition or restriction that the sum of 

dStribuToT varlates is equal t0 ”■ Henoe ’ our problem 1810 find suoh a 

First, we shall show that the Poisson distribution with large to 
approaches a normal distribution. Let 


l°g /(X;to) = —m, + X log m — log X' 

But 

log X! = log V2^ + (X + 4) log X - X + AX + . 
by Stirling’s approximation. Let X = m + £; then 

log f(X;m) = -to + m log to + { log m - log s/%, 

— (m, + £ + log m - (m + £ + -+ JL] 

v 2/ v 2/ to 2to 2 + 3to 8 1 


+ (W+ «-l2^ 


log V2irro ~£^ + 0(m~i), 


Hence ^ meaM that the remainin gterms are of order nr* or smaller. 


/(X ;to) - —i= « 
V 27 rm 


(X-m) 


[l + OCm-i/z)] 


or, for large m, is distributed approximately as 
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We may now write the joint distribution of the X’a for 

k 

1 
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as 


n 


\/2irnii 




subject to the linear restriction 


k 

y {Xi - mi) = 0, 


L, 

i= 1 


k 

where ^ Xi — n. Consider the transformation 
i = 1 


d< 


X j — nij 

vW 

where rm = npi. Using results of Example 5.14, it may be shown that 


(Tij = E(didj) 
n — m. 


mifrij 


n 


.2 _ 


n 


It has been shown that the di approach normal variates as n is increased. 
Hence, in the limit 




o (Xi - m# 




would be distributed as x 2 with k degrees of freedom, except for the^one 
linear restriction 2 (X < - mi) - 0. This is equivalent to 2 di V»i - 0, 
which indicates explicitly that the di are not indepen en ; 

By the use of completely orthogonal linear forms, it may be shown that 
in the limit, 2 d? is actually distributed as x 2 with (fc - 1 ) degrees of ree- 
dom. To this end consider the case k = 2(n - mi + m 2 ). net 

Zl = (dj Vm + d 2 y/mi)/ V» = °. 

= (bidi + 62 di), 

where bi and b 2 are to be chosen so that <4 = 1 and <r 21Si = 0. It follows 

that _ 

_ bxm 2 Vm~i + b 2 mi Vm 2 - (fit Vmi + t. vWgg? =p 0, 

ffsiZi n \/n 

2 _ h i m2 + bjmi - 2 bxb 2 y / mim 2 _ ^ 
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From the first equation, we see that the value of one of the 6, can be 
chosen at random; then this value is inserted in the second equation to solve 
for the other Suppose we let b, = vW»; then & 2 = - -v/rnl/n 

A? t * 2 = 4 SillCe 02 approaches a variate which is 
\ (0,1), d x + d 2 ~ z 2 approaches a x 2 variate with 1 degree of freedom. 
For k classes, we set up the following completely orthogonal linear forms 


3l 


^2 


£3 


d\ Vffli + d 2 VW 2 + • • • + d k \/mk 


K 

2 

t = i 


- 0 , 


m, 


Zk 


di V mim 2 — mid 2 
V Wi(mj + m 2 ) 

di \Zmim 3 + d 2 y 7 m 2 m 3 — (wi + m 2 )d 3 

V 0^7+ m 2 )(m 1 +77T+777 

__. __ ft-i 

\/m\mk + d 2 V+ • • • — ^ V ^ 


4 = 1 


/ft -1 


W -2 

i = 1 i = 1 


m* 


frri” 7 r { * j ’ and 4= ( w - w *)/« f ° r the n follows 

that the {z t: j are NID(0,1), and since z\ = 0, 

* * * 

2 - 2 ■! - 2 

* = 1 4 = 1 i = 2 

But the last expression on the right is distributed as x 2 with (k - 1) 

& 

degrees of freedom, and hence so is ^ df. 


4 = 1 


A test of the hypothesis that the expected values in the several classes 
are given by the specified m^i = 1,2, .. . ,/o) is approximated by obtain¬ 
ing the probability of getting a x 2 with (7c - 1) degrees of freedom greater 
than the computed 


Xo 


fc 

-l 

4 = 1 


(Xj — M j ) 2 

Mi 


hyptthes r ^ bability ^ UI1USUally Sma11 ' say - 05 ' we ma > r ohoose t0 reject the 
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Example 12.1. Genetic data provide examples of the use of x 2 in 
«goodness of fit” tests. Some possible theoretical genetic ratios are 
3:1, 1:1, 9:7, 15:1, and 63:1. 

In a study of chlorophyll inheritance in corn, Lmdstrom 1 found 98 
green and 24 yellow seedlings in one progeny of 122 young corn plants. 
Presumably green is dominant to yellow and is segregated in a ratio 3:1, 
so that we should expect m\ = 91.5 green seedlings and m 2 = 30.5 yellow 
seedlings if this ratio is correct. To test this hypothesis, we calculate 

2 _ (98 - 91.5)* , (24 - 30.5) 2 = § 

Xo “ 91.5 ^ 30.5 

By interpolation, the probability of obtaining a chi-square value of this 
size or larger on the assumption that the genetic ratio is 3.1 is .18. 
Hence, the 3:1 ratio is not rejected as a possible fit to the data. 

Example 12.2. Federer 2 fitted a normal curve to the frequency dis¬ 
tribution below of rubber content (percentage) in 378 guayule plants 


Class center 

1.5 

2.5 

3.5 

4.5 

5.5 

6.5 

7.5 

8.5 

! 

Frequency 

1 

1 

2 

33 

139 

155 

42 

5 


The mean and standard deviation of the 378 observations were found to 
be 6.07 and .892, respectively. Using the tables for the normal curve, 
Federer calculated the expected values for the respective classes set out 
below: 


Class 

<4.0 

4.0-5.0 

5.0-6.0 

6.0-7.0 

7.0-8.0 

>8.0 

Total 

Observed frequency 
Expected frequency 

4 

3.8 

33 

39.7 

139 

133.4 

155 

144.7 

42 

50.6 

5 

5.8 

378 

378.0 

' t* ■ 


The frequencies of the first three classes in the first table have been pooled 
in the second table in order that the number in any class be not less than 5 
approximately as suggested by Fisher. 3 Using the formula developed m 
this section Federer found xl = 3.302. The proof of the rule given by 
Fisher 3 for determining the degrees of freedom to be assigned xl for this 
example is beyond the scope of this text. This rule states that the correct 
number of degrees of freedom may be found by subtracting the total num¬ 
ber of restrictions imposed on the data from the total number of classes. 
In the example this would be 6 — 3, since the sum, mean, and standard 
deviation of the sample and hypothetical curve have been equated. The 
probability of obtaining a chi-square value as large as or larger than 3.30 
with 3 degrees of freedom lies between .5 and ,3. Hence, we have no 
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evidence to reject the hypothesis that the rubber contents in the 378 
guayule plants are normally distributed. 

12.3. Contingency Tables. Consider the ordinary 2X2 contingency 
table used in applied statistics: 


Expectations Observations 



bi 

b 2 

b 


fei 

b 2 

b 

a 1 

npaVb 

np a Qb 

np a 

0,1 

nn 

ni 2 

ni. 

a 2 

nq a p b 

nq a q b 

nq a 

Cl 2 

n~n 

n 22 

Ui. 

a 

np h 

nq b 

n 

a 

n.i 

n.i 

n.. 


In this case k = 4, but we have more than one linear restriction. If 
there is no a priori knowledge of the values of p a and p b , we usually use 
their maximuM-likelihood estimates, which are 


np a — fti. or nq a — n%.\ np b = n.i or nq b — n. 2; n.. = n. 

These relations constitute three distinct linear restrictions. 

Using the methods of Sec. 12.2, it can be shown that, if the given num¬ 
ber of restrictions r can be reduced to r orthogonal restrictions, then 


h 

l 

i = 1 


(X,- — Mi) 2 
nii 


is distributed as x 2 with (k — r) degrees of freedom. To this end we set 
up the following: 


Zl V PaPb + du \/p a q b + e?2i a/ q a p b + ^22 \/q a qb = 0 , 


£ 2 = ^ ^11 \ // PaPb + q a dl2 -y/p^b — Pa C?21 V gqff6 ~ Pa d 2 2 '\/q aq.b __ 

Vp a qa 


= 0. 


23 


= ^11 y/paPb Pb di2 \/p a q b + q b dn \/q a p b — p h <j 22 \/q a q b _ 


where 


VVbqb 

d = VAI ~ U VaPb 


0 ; 


etc. 


V np a p b 

By the methods similar to those of Sec. 12.2 it can be shown that d\ 
approaches a iV(0,l) variate for large m. 
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Since 

E(duZi) = (1 “ VaPb) - VPaqbPaPbPaqb ~ VqapbPaPbqaPb 

- \/q a qbPaPbq a qb 

= VPaPb [1 - (PaPb + p a qb + qaPb + M&)] = o, 

and similarly for the other d’s and 2 J s, then E{ziZj) = 0 for i ^ j . 

A fourth orthogonal linear form is found to be 

£4 = dll VM& — di2 V qaPb — d 2 1 V PaQb + d 22 V PaPb 9* 0. 

It can be shown that 2 4 is also JV(0,1) and is independent of the other three. 
It follows that 2 J = x 2 with 1 degree of freedom. It should be noted that 
the number of degrees of freedom is reduced by one for every parameter 
estimated from the data. If r distinct parameters are estimated, the 
number of degrees of freedom for x 2 is (k — r — 1) (1 degree of freedom 

k k 

was also lost in making ^ ^ Xi = n'j. 

i — 1 i — 1 

To make the test of the independence of the two classifications in the 
two-way table, we calculate 

2 ^ (^11 ~ nPaPb) 2 . _j_ (^22 ~ nq a q h ) 2 

Xo np a pb nq a q b 

and compare this calculated x 2 with the tabular x 2 of 1 degree of freedom 
at the 5 per cent or 1 per cent points. If the calculated x 2 is greater than 
the tabular x 2 at either per cent point, we say that we have evidence 
from this sample that the two classifications are not independent. In 
other words, there would be evidence of an interaction between the two 
classifications. 

More precisely we are evaluating the probability that an observed set of 
frequencies X h X 2 , . . . , X k could have resulted from a multinomial 
distribution with probabilities p-L, p 2 , • • • , Pu> The requirements that 
must be met in order that the x 2 approximation may be used to evaluate 
this probability are (i) the frequencies follow the multinomial distribution, 
(ii) the expectations are large enough so that the normal approximation is 
satisfactory, and (iii) any estimation of the p’s should be efficient. For 
discussions related to this problem consult Cochran. 4 ’ 5 

The second paper by Cochran referred to above discusses a “correction 
for continuity” which Yates 6 had proposed earlier and which was sub¬ 
sequently mentioned in texts by Fisher, 3 Snedecor, 7 and others. For the 
2X2 contingency table, Yates suggested that .5 be subtracted from the 
absolute value of each deviation in computing x 2 in order to correct for a 
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slight bias in determining the true probability levels. This slight bias 
arises from the fact that the x 2 distribution is continuous, whereas the 
frequencies are discrete. The correction for continuity is not very impor¬ 
tant for x 2 with more than a single degree of freedom and is never to be 
used when adding x 2 values. 

Example 12.3. The F 2 progeny of a barley cross, Robertson 8 segre¬ 
gated in the following manner: 



F 

/ 

Total 

V 

1,178 

273 

1,451 

V 

291 

156 

447 

Total 

1,469 

429 

1,898 


We calculate 

= (54.97) 2 (-54.97)2 (-54.97) 2 (54.97)2 _ 

x 1,123.03 ^ 327.97 ^ 345.97 ^ 101.03 ~ 

The probability of obtaining a chi-square this large or larger on the 
assumption of independence of the classification is extremely small. 
Hence, there is considerable evidence in favor of association of these two 
attributes. 

12.4. Homogeneity of a Binomial Series. Suppose that our data 
consist of Xi, X 2 , . . . , X* successes out of each of n independent trials. 
In tabular form we have 


Total 


Successes 

Xi 

x 2 

. . X* 

kX 

Failures 

n - Xy 

. - n — X 2 

n - X k 

k(n — X) 

Trials 

n 

n 

. . n 

nk 


We wish to test the hypothesis that the probability at every trial is p 
(constant). The probability of obtaining the particular sample values on 
the assumption that the hypothesis is true is 

k 

P = fT - —_ v X iQ n-Xi 

LI X 7 l(n - Xi)\ p 1 ' 

i = 1 

The maximum-likelihood estimate of p is X/n. Then the expected values 
for each of the success cells is np = X and for each of the failure cells is 
nil ~ (n — X). Now, consider the expression 
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i = l i==1 


2 Y' (n — Xi — n- + X)‘ 

+ >- - ^ ' 

n — X 

% — i 


k 

2 (Xi - X) ; 

l — l ____ 

~X(n - X) 


We may write this in the form 

(Xt-xy 


I 

i— 1 
k 


npq 



K 

Now, as n —> °°, £ (X,- - X) 2 M*2 approaches the tabular x 2 with 

t = l 

(k — 1) degrees of freedom and 

npq 


n ’ 


X 


n 


0 ^!) 


approaches one with a negligible error. Hence an approximate test of 
homogeneity of a binomial series may be made by using 

k 

n y ( X i - xy 


9 Z = 1 

x 2 = 


X(n - X) 

with (fc - 1) degrees of freedom. t 

Example 12.4. Ten samples of 25 stalks of corn each examined for 
European corn-borer infestation gave the following counts: 11, 7 , 3, 8, 15, 
2,10,21,18,9. Is the infestation random? 

We calculate 

2 (25) (336.4) _ 3Q 

* = (Io^T=lo3) “ 55,dJj 

with 9 degrees of freedom. Upon consulting the x 2 tables we find this 
X 2 value to be highly significant, and hence we have evidence that the 

infestation is not random. # 

12.5. Homogeneity of a Poisson Series. The data consist of X h 
X 2) X fc . We wish to test the hypothesis that each of the X's 

comes from a Poisson distribution with the same m. The probability of 
obtaining the particular sample values on the assumption that the 
hypothesis is true is 


n 


e~ m m Xi 
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It can be shown that the maximum likelihood estimator for m is X. Con¬ 
sider the expression 

i= 1 
k 

Now, as n-* co, ^ (X* - X) 2 /m approaches the tabular x 2 with (k - 1) 

* = i 

degrees of freedom and X/m approaches 1. Hence we may use 

%2 = J ~ ^ 


t'= 1 


X 


with (& 1) degrees of freedom as an approximate test of homogeneity 

of a Poisson series. 

Example 12.5. The number of wireworms in the check plots of a 
Latin square were 2, 6, 4, 9, 8. Is the infestation random? 

Assuming that the counts are distributed in a Poisson fashion, we 
calculate x 2 = 5.66 with 4 degrees of freedom. Upon consulting' the 
tables we find the probability of obtaining such a chi-square value on the 
assumption of homogeneity to be large; hence we have no evidence that 
the null hypothesis is false. 

Further discussion relative to the material in Secs. 12.4 and 12.5 may 
be found in Fisher 9 and Cochran. 4 

12.6. Combination of Probabilities. Suppose that the following 
information has been obtained from two experiments: 

Experiment 1. A—5 = 2.15, estimated standard error is 1.28, 
degrees of freedom are 30, t — 1.680, and the single-tailed probability is 
Pi = .0544. 

Experiment 2. Out of 10 trials A was superior to 5 8 times. 

V 2 = a ) 10 + 10 (|) 9 (i) + 45(i)*(i)2 = .0547. 

Can we make a test of the hypothesis that 1=5, with the alternate 
hypothesis that A > 5, by combining the information from these two 
experiments? 

Let X be distributed as f(X) AX, - cc < X < » ; then it is possible to 
show that -2 log e p* follows the x 2 distribution with 2 degrees of freedom 
where ’ 


Vi = JX" f{t) dt 

To this end we find the distribution of pi to be 
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But 

S =/(x); 

lienee the distribution of pi is dp^ Also, when X > — °°, p { 0, and 
when X —» o° ? p i —> 1. Let 

u = log e pi ; 

then the distribution of w is 


Let 


du = e u du, — co < w < 0. 
2 = — 2w, dz = —2 du; 



then the distribution of 2 = —2 log e pi is 


(i)e~^dz, 0<2<co. 

But the distribution of x 2 with 2 degrees of freedom is 

(i)e-* 2/2 d( X 2 ). 

Hence, 2 = -2 log* p { is distributed as x 2 with 2 degrees of freedom. 

Since the pi are independent, the sum of k such x 2 values is distributed 
as x 2 with 2k degrees of freedom. 

Applying the method developed above to the two experiments, we find 

log 10 pi = 2.7356 
logic P 2 = 2.7380 

3.4736 = -2.5264 

and 

Xq = — 2 (— 2.5264) (2.3026) = 11.634, 

with 4 degrees of freedom. The probability of obtaining ax 2 as large as 
or larger than this value on the assumption that the hypothesis tested is 
true is .02. 

12.7. Bartlett’s Test of Homogeneity of Variances. Suppose we have k 
independent sample variances with rii degrees of freedom each from 
populations which are N . It is desired to test 

Ho :al = 0 - 1 = * • * 


in other words, that each Vi is an estimate of the same population variance 
a- 2 . Bartlett 10 has proposed the criterion Q/l, which can be shown to be 
approximately distributed as x 2 with (fc — 1) degrees of freedom, as a 
test of Ho. In this expression 
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and 


ft 

Q =» n log v - ^ rii log v i} 

i= 1 


i- 1 + 


1 

3(A) - 1) 


ft 



i = 1 


1 

71 


ft ft 

where n = ^ n,- and £ = ^ n&i/n. 
i — 1 i — 1 

The proof that Q/l is approximately distributed as x 2 with (k — 1) 
degrees of freedom will be outlined here. Using the distribution of Vi 
(with Ui degrees of freedom), we find that the moment generating function 
and cumulant generating function for log Vi are 



and 

m - t log (+) + log r (| + /) - log r (l). 
From Stirling’s formula for approximating factorials, 


T(x) = e-^\x - 1)*“* 


1 + 


12(x - 1)J 

Hence, omitting the subscript i for convenience, 


\/27T. 


K (t) = t (log <r 2 - log fj - t + (< + log (* + 

~ ( !L 2 J ) log ( ?L i L? ) + log 


1 

6 ( 2 1 + 77 - 2 ) 


Since 


log 1 + 


6(w - 2), 


!°g (t + - 2 2 ) = log (l + -U 2 ) + log (++), 


log 


Cv) - 10 + - (1 ~t}’ 
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n-l\ 

2t 


2 )[ 

n — 2 

2(n 

i r 2 

4 

8 

— t —r jc—g + 
n 2n 2 

3ft 3 


log (1 + x) = X — -o + • • * 


I & 3 ? +A +A + . 

’’ 3 (n — 2) 3 J \_n 2n 2 3n 3 

+ 1 _. 

^ [6(2 1 + n - 2) 6(n — 2) 

Upon simplifying the last bracket to 

— t _ 

3(« - 2) 2 (l + UAC) 

we may write the cumulants as, upon replacing the subscript i, 
ki - log <r 1 + n . _ 2 »■ nf 3 (rii - 2) 2 


f— 

L?ii — 


• 1 2 1 1 
= log ' ^ “ w 

ni — 1 2 

2 ” (m - 2) 2 3(^ - 2)' 


neglecting terms of order wy 3 , 

-1 ( re * + n< + 1)’ 


. „ 4 (ft* —1) * ** | .i. ■* 2 _i_ o™. _l_ q\ 

*• = 6 Ls^ -'2)S _ TnT^W m- 2) 4 J “ nt + £) ' 

The above results may be used to find the cumulants of Q, which can 
be shown to be 

Kt[Q] = — ( —n) r K r (log v) + S( — ni) r Kr(log Vi). 

Bartlett obtains the above results by the following arguments: Now, 


Q = log 


n©' 


It can be shown that the distribution of v/vi is independent of v; hence 
K[Q + (-n log v)} = K(Q) + K(-« log v), 


K(Q) = K[Q + {-n log »)] - K(-w log v) 


= K [ 2 (—«j) log t> 4 ] - K \-n log »]. 
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Hence, the cumulants for Q are 


*1 = a* = l n log cr 2 — 1 


n log <j 2 + k + 


" (k ~ 1} t 1 + 3(F^I) (X i ~ «)] = 


n t + 1 + 


k 2 = cr 2 = — - I n 2 + n 


~ 2(k ~ 1} I 1 + 3(fc^I) (X i " 0 

t'=l 

k 

* 2(i: _ 1} I 1 + 31 *^ (X i ~ = 2{k - 1)12 ’ 


k 

(n 2 + 2n + 2) + 4 2( ^ + 2 + ^ 




k 

Vi 

Z-/ w* 


■ 8 — 1) Z 3 . 


Q\ Jtr(Q) 


l l ' 


we obtain the first three approximate cumulants of y// as 

ki ~ (fc - 1), *2 = 2(fc - 1), k 3 = 8(k - 1), 

which are also the first three cumulants for the x 2 distribution with 
(A; — 1) degrees of freedom. Hence, Q/l will be approximately dis¬ 
tributed as x 2 with (k — 1) degrees of freedom. 

12.8. Test of a Second-order Interaction in a 2 X 2 X 2 Contingency 
Table. Suppose we extend the 2X2 contingency table of Sec. 12.3 to 
the 2X2X2 table below: 
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Li 



d 

2 



Br 

b 2 

Bi 

b 2 

Totals 

Ci 

C 2 

Cl 

C 2 

Ci 

C 2 

Cl 

c 2 


Pi 

P2 

P» 

P* 

P& 

P6 

P? 

Ps 

1 

mi 

m 2 

m 3 

m 4 

m B 

m 6 

m 7 

m 8 

n 

Xi 

x% 

X% 

X\ 

£5 


£7 

#8 

n 


The three classifications are designated by A, B, and C, respectively, 
while m i} and Xi are the probabilities, expected values, and observed 
values, respectively, corresponding to the respective Ci subclasses. 

The first-order interactions such as BC were discussed in Sec. 12,3. 
The BC interaction for A i is defined as 


P1/P2 

Pz/pi 


and for A 2 as 


Pz/Pz 

Pi/Ps 


The null hypothesis tested for a 2 X 2 contingency table 


is 


P 1 /P 2 = Pz/Ph 

which is equivalent to 

Pi/P* = 

Pz/P 4 ’ 


that is, the interaction is unity. 

The null hypothesis for testing the existence of an ABC interaction is 
that the BC interaction is the same for both A\ and A 2 , or symbolically 
that 

P l /P 2 = Pb/PG } 

Pz/Pi Pr/ps 

which reduces to 

PiPaPgPi = P2PzPzPz- 


Estimates of the pi s or nu = p^ may be obtained by the method of 
maximum likelihood. The probability of obtaining a particular set of 
observed xi s is given by 


P s = 


nl 


XilXs 


W 


pVp* 


Pt 
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It follows that 


n 


Hence, the logarithm of the likelihood function is 

8 

L = ^ Xi log Mi, 

i— 1 

which is subject to the restrictions 

(i) mim 4 m 6 m7 = 

8 

(ii) ^ mi = n. 

i -.1 

In order to determine estimators of the m h we form the augmented 
function 

/ V 8 

= £ Xi — l ^ ^ m* ~ n'j 

and obtain the partial derivatives 

= £L + ** - i = 0 

dm i mi mi ’ 


dL f _ x 2 kji 
dm 2 m 2 m 2 


l = 0 , 


where we have set 


dL' _ x 8 k/. 
dm$ m 8 m s 


l = 0. 


X = mim 4 m 6 m 7 , /x = m 2 m 3 m 5 m 8 . 

The following equations may now be obtained: 

aq = m 4 Z — /cX, x 2 = m 2 Z + 7 ^/x, 
*^4 ~ m 4 Z /cX, — m 3 Z -f- kfjL } 

x 6 = m 6 Z — k\, xr 0 = m&l + kp y 

X7 — m 7 l /cX, n?a == m 8 Z ~h kjx. 

Upon adding the eight equations, we obtain 

n = In — 4 k\ + 4fc/x, 
or 

n = bi, 
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since X = ju, and hence l = 1. Using this result, and setting k\ - kp 6, 
we obtain the estimators 


mi = X\ + 5, 

m 2 = x 2 — 

S, 

A 

mi = Xi 4“ 

m = x 3 — 


m% = xq + h, 

% 

11 

C n 

1 

*, 

m 7 = x 7 + 5, 

ms = x 8 — 

5 . 


Multiplying both sides of the equality tested by the null hypothesis by 
n, we obtain 

mim 4 m 6 W7 = m-im.zm^ms- 

An equation for obtaining a value for 5 may be obtained by substitution, 
that is, 

( Xl + 5)04 + 5)0« + 5)0v + «) = (*! - b( Xs ~ ~ 

From S and the z/s, values for the to) may be obtained. . A test of the 
null hypothesis may now be performed by using the criterion 


8 

2 

i— 1 


(xj — mi)' 
mi 


where the associated degrees of freedom are determined by subtracting the 
number of independent parameters estimated from the total number ot 
classes, that is, 8 - 1 - 6, or 1, in this case. 


EXERCISES 

12.1. (Some of Beall’s data given by Neyman. 11 ) Fit a Poisson curve 
to the following distribution of European corn borers in 120 groups of 8 
hills each. Use the method illustrated in Example 12.2 to calculate a 
“goodness of fit” chi-square. 


No. of borers I 0 


Observed frequency (24 16 


2 3 4 5 6 7 8 9 10 11 12 

16 18 15 9 6 5 3 4 3 0 1 


12 2 Use the method of Sec. 12.8 to develop a test for interaction m a 
2X2 contingency table. Show that this is the same test as the test of 
independence of the two classifications developed in Sec. 12.3. _ 

12.3. If the entries in the cells of a 2 X 2 contingency table are desig¬ 
nated a, b, c, and d and a + b + c + d = n, it can be shown that the 
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exact probability of any observed set of entries is given by 

(a + b) \{c -f d) \(a + c) !(fr + d )! 
n \a !b Ic Id ! 

where the marginal frequencies are assumed to be fixed in repeated 
sampling. Use the formula and Stirling’s approximation for factorials 
for the data in Example 12.3. 

12.4. Calculate chi-square corrected for continuity for Example 12 3 

12.5. A more rigorous derivation of the quantity distributed as chi- 

square to be used in testing homogeneity of a binomial series starts by 
showing that J 


lim C{n ) x)'p x q n ~ x =b — — 1 e - 

n ~* 00 v 27 mpq 


-(x—np) 2 /2npq 


Assuming this relationship, eomplete the argument. 

12.6. Using the approach suggested in Exercise 12.5, discuss an alter¬ 
nate method to that used m Sec. 12.5 of obtaining the quantity distributed 
as chi-square to be used in testing homogeneity of a Poisson series 
. W 7 ■ The meth °d used in Sec. 12.6 to obtain the distribution of p 
incidentally proves a general theorem which states that any continuous 
distribution may be transformed into a rectangular distribution. Using 
this result, complete the argument for showing that there exists at least 
one transformation which will transform any continuous frequency dis¬ 
tribution into any other continuous frequency distribution. 

12.8. Check the expression given for m(t) for log F» in Sec. 12 7 by 
suitable modification of the results of Exercise 7.15. 

12.9. Apply Bartlett’s test of homogeneity of variances to Federer’s 2 
data from seven nonselected strains of guayule which were in the 54 + 
chromosome group. 


s t - 9.28 6.80 7.26 7.43 9.99 14.02 10.80 

n ‘ 117 119 117 115 119 116 ^ 

12.10. Check the agreement between values of Q in Sec 12 7 and 

Snedecor’s F for k = 2 and m = », = 1, 2, 3, and 6. Is one of these 
exact? 

12 . 11 . Bartlett, 12 using published data of Hoblyn and Palmer given 
below obtained S = 5.1, an uncorrected x 2 = 2.274, and a corrected 
X - 1.850 m testing interaction in a 2 X 2 X 2 contingency table. The 
experiment was planned to investigate the propagation of plum rootstocks 
from root cuttings, the number of cuttings for the variety represented 

being 240 for each of the four treatments (only one treatment is considered 
here). 
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Alive 


Dead 


Length of cutting 

Time of planting 

Total 

Time of planting 

Total 


At once In spring 


At once In spring 


Long 

Short 

156 

84 

240 

84 

156 

240 

107 

31 

138 

133 

209 

342 

Total 

263 

115 

378 

217 

365 

582 

y 


Check Bartlett's results. 
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Part II 

ANALYSIS OF EXPERIMENTAL MODELS 
BY LEAST SQUARES 







CHAPTER 13 

REGRESSION ANALYSIS 

13.1. Introduction. In Part I, we have presented some of tte basic 

statistical concepts of estimation and tests 0 °“ imum likelihood, 

as our basic estimation procedure the method ol m & minimum 

because it has certain optimum pr p , letter exists. Now we 

variance estimator and a sufficient 0 value 0 / gome dependent 

propose to ^er the P™ Qn one or more ot her fixed variaUs, 

vanaU Y, on yariate will be understood to have a probar 

bffity distribution, while a 

thtTartbihty of P Sr this V«^J*«** 
expected to fluctuate from sample to sample, ^ 1 ® f different 

ts&ss&zsszL —-* —** 

„ bi „»t, dktributa., th, 

A « wZtL ,o «. c„,e m MW- - »« * 

straight line of the form 

E(F|X) « « + /3X. 

In fact, if X and Y were distributed as a bivariate normal, 

E{Y\X) = a + PX. 

fade iSuffiblunderstood that in 

153 
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between F and?* 'Tf is 3 kLvTthT^ 11 ^ ^ ^ reIationshi P 
interval of most functions by a straight h-e h* Can . f approximate a short 
small range of X, it is possible that <f h , enoe ’ lf we collect data for a 

w^l even though the true ^tionahi^oSS? “ ^ ^ 

Let us assume that the measured value of F can be bitten as 

Y = E{Y\X) + e> 

toby x r ° °f 7 not aoo °nnted 

curve is selected so that the residuals are ofT^ 6 ^ th ® regression 
usual added assumption that the e are mmo^Tfr^’ With the 
tion of r fixed variates, we might write ’ ' If 7 ls a lmear f «nc- 

Y- a+01 X 1 + ^x 2 + . . . +0rXr + ( 

wher eE(Y) = « + &X, + /3 2 X 2 + ... +8 x T , . 
that the only error involved is e- in other , " a i™ ef,uatlon assumes 
*’s. Hence we are considering only 11 ’ ^ is n ° error in the 

with the mean of F being approximJ A ° ne ' VanabIe norma l distribution 
*’«• K the X’s are * **»}* ^ functio “ * the 

have probability distributions of their ownTwe “ °?T W ° rdS) the X ’ s 
more complicated problem of miilth™-- , ’ , are ed to consi der the 

sidering here only regression P n„»H ^ ana W IS - A lso, we are con- 
coefficients, « and the 0’s. Metho^'t^dr r6 lin6ar “ th ® regression 
and problems of nonlinearity of the f h dlmg mul tivariate problems 
the scope of this book. It hould be em TTi T®^ are b ^°nd 
can be handled by the intro^clT P f u ^ nonl “ty of the 
s i° n equation. For examp “ T m h^ SUbSbtUte tems in the «^es- 
In order to estimate th, X J Very wel1 ™P**ent XI 

a nd X, X, . x t r t tl ? nship be ^en L and X (or between F 
will be obtained on Fand X ^We^an* 6 ^^ slm ^ ltane °us observations 
terms of estimates of 2^)' and las eaCh ° bSerVed vaIue U in 


U = ?, + e, 


3 = t. 2, 


where f 3 - is the estimate of E(Y-) and e f* , - 

is a linear relationship, ' d ' th estimate of * Hence if E(Y) 

Yj ~ a $Xj + €j = a -f bXj + e h 
where a == a and R — h mu t , 

for “is estimated by. ” We canrepresent th^ 1 b ? US6d m this part to stand 
as in Fig. 13.1. E{Y) is the true r * these two equations graphically, 
regression line. ^ 

\-^3>x 3)t Liie true error is given 
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by e s (QP), the estimated residual by *(BP), and the difference between t 

a "?n mlerto obtain vailed ola and b, at least two sets of observations 
(X Y ) are required. Of course, if only two sets are obtained Y will pass 
through the two points, both values of e will be 0, and the values of a and 
b can be determined by a simultaneous solution of the two equations 
representing the two points. However, if more than, two sets ^ observa¬ 
tions are obtained, we shall have the situation pictured m Fig. 13.1 with a 
sample residual, e, corresponding to eachset (X, Y). When there are more 
than two sets of points, some new method of determining a and - must, be 
found In the chapter on bivariate distributions, we indicated that a and 
b could be determined by the method of moments. There are many 



other methods of determining these estimates of the parameters, a and (3, 

in order to obtain the “best” linear fit to the data. 

In any case, it seems reasonable to make the {el as small as possible. 
But what do we do to make these {e} small ? Many courses are suggested, 

among which are: , 

(i) Minimize the sum of the absolute values of the e. 

(ii) Minimize the greatest of the absolute residuals. 

(in') Minimize the sum of the squares of the residuals ■ . 

Method (iii), called the “method of least squares,” is probably the easiest 
to apply and has certain optimum properties^ It has been shown_ for 
fixed X’s that this method produces a linear unbiased estimate of Y v, hich 
has minimum variance.*- 2 Also if the errors, *, are NID the method of 
least squares produces the same estimates as does the method of maximum 
likelihood The method of moments will also give the same estimate s 
NID errors, provided the regression equation is linear in the parameters. 

In the derivations which follow for estimating the parameters m the 
regression equation, for example, a and p, we need postulate.only ’hat the 
errors are noncorrelated and have the same variance._ When the usual 
tests of significance (such as t and F) and confidence limits are introduced, 
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It is necessary to assume further that the errors are normally and inde¬ 
pendently distributed. Of course, noncorrelated normal errors are inde- 
pendent. In general we shall assume NID errors, with the understanding 
that the assumption of normality can be omitted if the investigator is 

interested only m point estimates and not in confidence limits or tests of 
significance. 

the^mobW^f Si T ° f J ° n , a Sing ' e Fixed Variate ’ Let us fir 8t consider 
the Problem of estimating the best linear relationship between Y and a 

single fixed variate, X, so that 

Y = a + 0X + e = a + bX + e, 

where « and p are the parameters and a and b their respective estimates 
‘ 18 th ® tru ® er ror and c the residual about the estimated regression line 
( Y ,~ J\ wh f e f “ a + «)' We assume that a sample of „ X’s 

are selected (without error) and the corresponding values of F’s then 
measured. If it is further assumed that the true errors («) are independ¬ 
ently distributed with zero means and the same variance, <r 2 , the method 

of least squares will produce unbiased and minimum variance estimates of 
the parameters, a and fi. 

The error sum of squares (SSE) is f 

SSE = Se 2 = S(Y - a - bX ) 2 . 

a and 6 are to be determined so as to minimize SSE. The estimating 
equation for a is s 

<9SSE 


Hence 


da 


= 0, <SF = no + bSX. 




If we insert this value of a in SSE, we find that 

SSE = S[(F - f) - b(X - X)] 2 = S(y - bx)\ 

where y = (F — F) and x - (X — X). Hence we might as well have 
written 

f — Y + bx instead of f = a -f bX, 

and similarly 

E(Y) = ii + px, 

where Y is the least-squares estimate of p and a = fi - fiX. The least- 

f In Part II the letter S will be used for summation over sample values, while 2 
will be reserved for a sum of fixed variates. 
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d[<% - bx ) 2 ] Sxy 

db = °> 6 = s^’ 

where Sxy = SXY — ( SX)(SY)/n and Sx 2 =- SX 2 — (SX) 2 /n. 

If we substitute this value of b in the equation for SSE, 

SSE = S(y - bx ) 2 = Sy 2 - bSxy = ,SF 2 - nY 2 - 6 2 & 2 . 

Hence we can say that the regression has resulted in a reduction of 


bSxy = b 2 Sx 2 = ( Sxy) 2 /Sx 2 s SSR 

in the sum of squares for Y, since nf 2 is the reduction attributable to the 

mean (the sum of squares of deviations from the mean is Sy 2 = SY 2 - n f 2 ) 

e proportional reduction in the sum of squares attributable to the 
regression on x is usually indicated as 




Sy 2 WNtEF) ~ r2 ’ 
where r is the correlation coefficient between X and Y. That is 

SSE = (1 - r 2 )Sy\ 

In terms of the parameters, y and fi, and the true errors {«}, 

y _ SY _ S(y + fix + e) 

n V - M + e, 

V = Y ~ Y = V- + fa + * ~ M - 6 = fix + (« - g) 
b = 'NE = Sx(fix + e — e) _ Sxt: 

Sx 2 fe 2 ~ £ + 


a=Y-bX = y + i 
Sxe 


fi 


Sxe 

Sx 2 


= a 


Sx 2 


X, 


since Sx = 0. 

Since E(e) = 0, we see that Y, b, and a are, respectively, unbiased 

estimates of y, fi, and a. And since the * are independently distributed 
with the same variance, 


X?) 


and 


n 


- *V) = * 2 t 


<r 2 (a) 


X 2 


]- 


Sx 2 


SX' 


n + Sx 2 nSx 2 


t In Part II, we shall use the notation <r 2 (*) and s 2 (x) to stand for the variance of & 






LEAST-SQUARES ANALYSIS 

The predicted value of Y for a given X, say X' , is 
f'=Y + b(X’ - X) = Y + bx'. 

If the experimenter wants to put confidence limits on Y', he must choose 

one of the following. r 11 v/ which might 

(i) The confidence limits for the average of all 7 , E(Y ), which might 

occur for the given value X = X . ^ y/ 

(ii) The confidence limits for any single predicted value, Y . 

Sr « S need to determine the variance of the difference between an 
oina e on the computed regression line, ^ and the corresponding 
ordinate on the true regression line, E(Y f ). This difference is 


§' = Y' - E(Y f ) = (Y — n) + (b — $)%', 


with variance 


y ,/2 


I+_ 

n ^ Sx 2 


■] 


But for (ii) it is necessary to estimate the variance of the difference 
between the ordinate on the computed regression line, F', and the corre¬ 
sponding true ordinate, Y'. This difference is 


r - ? f = 


Sxe , 


with variance 


f 1 + ^ + Jb] 


where e' is not one of the e’s in the original sample of n a 

If we now assume that the « are NID(0,<r>). SSE is distributed as X V 

with (n — 2) degrees of freedom, so that 

s 2 = SSE f(n - 2) 

is an unbiased estimate of v 2 . The proof is as follows 


SSE = Se 2 = S[e 


Sxe Y 

- * - V 


Se 2 - nl 2 - 


(Sxe) 2 
Sx 2 ' 


get y/n l = \/n (F — g) = 2 » “d (Sxe)/^Sx 2 = Q> - P) VSx 2 - * 1 . 
Then 

SSE = Se 2 - el - el- 

But *„ and ai are orthogonal linear functions of NID(0,v 2 ) variables each 
, beinB itself NID(0,cr 2 ). Hence each z 2 is independently distributed as 
* V Si1 degreeof freedom, and SSE is then distributed as X V with 
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(n — 2) degrees of freedom, f Therefore we can use s 2 as an estimate of 
o' 2 in the above variances and obtain the following confidence limits* 

(i) P - taS(8') < E(Y') <P + t a 8(d') f ' ■ 

(ii) P ~ t a s(e') < T < P + t a s(e/), 

where s(8') and s(e') are the same as the corresponding o-’s but with a 
replaced by s, and (?{t > t a ) = a/2. 

In both cases, we estimate P = F + bx', where Y and b were com¬ 
puted from the previous sample. The two sets of confidence limits reflect 



two different uses of this estimate: those for (i) are concerned with esti¬ 
mating the average Y for the given X, while those for (ii) are concerned 
with a single Y on a given experiment. It should be reemphasized that 
we assume a distribution of F’s for each X and that the second confidence 
interval is necessarily much wider because the variability of separate F’s 
is also considered. It should be noted that s 2 (e') = s 2 (8') + s 2 . 

These results are illustrated in Fig. 13.2. The reader will note that the 
confidence lines form a hyperbola, with the curvature being much greater 
for the average Y[f ± ts(8)] than for a single predicted Y[f ± ts(e)]. 

f See Sec. 14.3 for a formal proof of this. The essence of the proof is that SSE is 
the sum of (n — 2) independent x 2 variables, each with 1 degree of freedom. 
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Finally we can use Student’s if as a test criterion to test the null hypoth¬ 
esis = /3 0 : _ 

t = (b - j8 0 ) VSx 2 /s, 

since the estimated variance of b will be s 2 /Sx 2 . 

We note that 

SSR = b 2 Sx 2 = (z! + (3 \/Sx 2 ) 2 = + 2ztf V Sx 2 + (3 2 Sx 2 ), 

Hence SSR is distributed as x 2 ^ 2 with 1 degree of freedom when /? = 0, 
and we can use the ratio 

F = SSR/s 2 


to test the null hypothesis that 0 = 0. Also 

E{ SSR) = <r 2 + 0 2 &e s > <r 2 , 

indicating that the one-tailed F test is the appropriate one. A convenient 
method of displaying these results is the analysis-of-variance table. 


Source of ! 
variation 

Degrees of 
freedom 

Sum of 
squares 

Mean square 

E(MS)| 

Regression 

Error 

1 

n - 2 

SSR 

SSE 

MSR = SSR 
s 2 = SSE/ (n - 2) 

cr 2 + fi^Sx 2 
<r 2 


squares divided by the degrees of freedom. 


One of the basic assumptions in the use of the method of least squares 
is that the errors are noncorrelated. J. Durbin and G. S. Watson have 
developed a statistic to test this assumption and have computed upper 
and lower bounds for the significance levels. 3 A discussion of regression 
analysis when the variance is not assumed constant for all X’s is presented 
in Bee. 14.4. 

The results of the regression analysis cannot be applied to the entire 
(X,Y) population—only to the set of X’s used in the analysis. C. P. 
Winsor 4 has prepared an excellent discussion of the problem of fitting 
regressions when errors of measurement are present in one or both sets of 
variates and when a random bivariate sample is secured. Wald 6 and 
Bartlett 6 have presented methods of fitting a straight line when both 


variables are subject to error. 

If a random bivariate sample has been secured, methods have been 
devised to obtain estimates of and confidence limits for the value of X for 
a future Y, as well as for the value of Y for a future X. The method of 
least squares, regarding X as fixed, produces the same results as the 
bivariate solution when predicting Y for a future X but not for the inverse 
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Table 13.1 

Data for Example 13.1 f 


X 

n 

SY 

n' 

S'Y 

X 

n 

SY 

to' 

S'Y 

39 

3 

1.9 

3 

1.9 

65 

11 

17.4 

11 

17.4 

41 

3 

3.4 

3 

3.4 

66 

19 

36.3 

17 

23.9 

42 

18 

13.1 

18 

13.1 

67 

8 

15.3 

7 

10.0 

43 

1 

.7 

1 

.7 

68 

7 

14.9 

6 

10.5 

44 

11 

7.1 

11 

7.1 

69 

11 

23.8 

10 

19.8 

45 

51 

49.9 

51 

49.9 

70 

12 

22.7 

10 

14.0 

46 

3 

2.6 

3 

2.6 

71 

13 

23.4 

13 

23.4 

47 

34 

31.9 

33 

27.6 

72 

8 

17.2 

8 

17.2 

48 

86 

78.3 

86 

78.3 

73 

9 

12.2 

9 

12.2 

49 

17 

10.7 

17 

10.7 

74 

12 

43.1 

9 

20.7 

50 

51 

44.3 

51 

44.3 

75 

14 

34.0 

13 

28.5 

51 

39 

40.0 

39 

40.0 

76 

11 

22.9 

11 

22.9 

52 

31 

36.2 

31 

36.2 

77 

9 

33.0 

6 

9.0 

53 

63 

65.2 

63 

65.2 

78 

4 

7.5 

4 

7.5 

54 

45 

64.5 

45 

64.5 

79 

14 

29.6 

13 

23.8 

55 

52 

55.1 

52 

55.1 

80 

6 

11.8 

6 

11.8 

56 

39 

58.0 

39 

58.0 

81 

6 

17.5 

5 

13.2 

57 

23 

34.2 

22 

28.3 

82 

3 

13.6 

2 

5.2 

58 

30 

37.7 

30 

37.7 

83 

4 

32.0 

2 

3.8 

59 

25 

41.4 

25 

41.4 

84 

7 

24.7 

5 

12.2 

60 

17 

26.0 

16 

22.0 

85 

2 

5.5 

2 

5.5 

61 

26 

41.0 

25 

31.9 

88 

2 

9.9 

0 

0 

62 

11 

15.6 

10 

11.4 

89 

1 

16.3 

0 

0 

63 

24 

32.8 

24 

32.8 

91 

1 

10.3 

0 

0 

64 

12 

29.5 

9 

16.4 











Total 

909 

1,316.0 

876 

1,093.0 


f X = socioeconomic score; Y = net farm income ($1,000); n — number of 
farmers; n’ = number of farmers with income less than $4,000. 


problem of predicting X for a future Y. Eisenhart, 7 Bliss, 8 and Winsor 4 
have discussed the latter problem when the method of least squares was 
used to estimate the regression line. 

Example 13.1. A study was made of the relationship between the 
net income (7) of 909 Southern farm families and a socioeconomic score 
(X), the latter based on the possession of certain items such as radio, tele¬ 
phone, automobile, and electricity and the education and church attend¬ 
ance of the heads of the families. 9 The possible range of X’s was 39 to 91. 
The number of sample families and the total income for each X are 
presented in Table 13.1, for all families and for the families with incomes 
less than $4,000. A sample of these data is presented graphically in 
Fig. 13.3. The following sums and sums of squares and cross products 
were obtained: 




162 


LEAST-SQUARES ANALYSIS 


SX = 51,852, SY = 1,316, 

SX 2 = 3,053,808, SXY = 81,621, SY 2 = 3,898, 

where Y was in terms of thousands of dollars. The sums of squares and 



Fig. 13.3. Regression of net income on socioeconomic score, 
products adjusted for the means were 

Sx 2 = 3,053,808 - (51,852) 2 /909 = 96,019, 

Sxy = 81,621 - (51,852)(l,316)/909 = 6,552.5, 

Sy 2 = 3,898 - (1,316) 2 /909 = 1,993. 

Hence 

b = Sxy/Sx 2 = .06824, Y = 1.448, X = 57.04, 

SSR = bSxy= (, Sxy) 2 /Sx 2 = 447, SSE = Sy 2 - SSR_=^1,546, 
s 2 = SSE/907 = 1.704, s = 1.305, s(b) = s/y/Sx 2 = .004211, 

t = b/s(b ) = 16.2. 

The 95 per cent confidence limits for 0 are 0 = .06824 ± (1.96) (.004211), 
or 

.05999 < 0 < .07649, 
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since the 95 per cent value of t is 1.96 for 907 degrees of freedom. Hence 
one could state that on the average an increase of one unit on the socio¬ 
economic score would be associated with an increase of from $59.99 to 
$76.49 in net income. The analysis-of-variance table is: 


Source of 

Degrees of 

Sum of 

Mean 

variation 

freedom 

squares 

square 

Regression 

1 

447 

447 

Error 

907 

1546 

1.704 


F = 447/1.704 = 262.3 - t\ 


There is an undoubted significant relationship between net income and 
socioeconomic score, but the percentage of variation accounted for by the 
regression is only (447)100/1,993 - 22 per cent. It was hoped that the 
relationship would be close enough so that in future surveys an adequate 
measure of income could be obtained from the socioeconomic score, since 
it is much easier to obtain socioeconomic information than income data 
(most of the socioeconomic items can be observed by the interviewer; 
hence interviewing biases can be eliminated). However, if only 22 per 
cent of the total income variability can be explained by the socioeconomic 
score, some other means of estimating net income must be devised. An 
attempt was made to obtain a possibly better fit to the data displayed in 
Fig. 13.3 by redefining the population to contain only incomes less than 
$4,000. However, when a new fit was attempted, the proportional reduc¬ 
tion in sum of squares due to regression was even less; hence this approach 
also was abandoned. Since the variation about the regression line seerned 
to increase with increasing X , it was thought that a logarithmic relation¬ 
ship might fit the data better; however, there was no real impiovement in 
the percentage of variability accounted for by the regression. 

The failure of the socioeconomic score to estimate net income is shown 
in the 95 per cent confidence limits for a predicted value of Y. 

(i) E(Y') = ?' + 2.558 ^.00111 + 96 / 319 ’ 


(ii) F' = f’ ± 2.558 ^1.00111 + 


96,019 

As an example, consider the confidence limits if X’ = 80. In this case 
Y> = f + bx' = 1.448 + (0.06824)(22.96) = 3.015. Hence the 95 per 


cent confidence limits are 

(i) E(Y') = 3.015 ± 2.558 V-006600 = 3.015 + .208, 

(ii) Y' = 3.015 ± 2.558 \/l .006600 = 3.015 ± 2.567. 
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Since it was hoped to use this regression equation for an individual family, 
the appropriate set of confidence limits to consider is (ii): 

.448 < Y' < 5.582. 

Hence we could only estimate the income as falling between $448 and 
$5,582 with 95 per cent confidence. The regression line, f, and the 95 
per cent confidence band for a single predicted F are shown in Fig. 13.3. 
Note that the confidence lines are practically parallel to f, because 
s 2 (e) = s 2 (S) + s 2 , and s 2 is the dominant term. 


EXERCISES 

13.1. Use the confidence limits for predicting E(Y') for a future X' to 
show that the confidence limits of X' for a future Y f are 


X' = X + 


b{Y' - Y) 
X 



f X (F' - Y) 2 
n Nr 2 ' 


where X = 6 2 — (tl$ 2 /Sx 2 ) and X, Y, b, s, and Sx 2 are based on the original 
sample of n. 

13.2. What changes would be made in the confidence limits in Exercise 
13.1 if Y’ were the average of k observations? 

13.3. The analysis of socioeconomic scores is simplified for those items 
with only two alternatives, for example, with or without electricity. Sup¬ 
pose we w T ant to correlate the scores on one such item with income. Let 
Wo be the score for each of the n 0 families without this item and wi the 
score for each of the ni families with this item (n 0 + m ~ n total families). 
Show that r 2 is independent of w 0 and w\. 

13.4. Girshick and Haavelmo have made an analysis of the demand for 
food in the United States for the years 1922 to 1941. 10 One equation in 
their analysis involved the relationship between disposable income 
adjusted for the cost of living (F) and investment per capita adjusted for 
the cost of living (X x ). The values of F and X x are shown in the accom¬ 
panying table. 

Data for Exercise 13.4 


Y 

Xi 

Y 

Xl 

Y 

Xi 

87.4 

92.9 

107.8 

142.9 

103.1 

114.3 

97.6 

142.9 

96.6 

92.9 

105.1 

121.4 

96.7 

100.0 

88.9 

97.6 

96.4 

78.6 

98.2 

123.8 

75.1 

52.4 

104.4 

109.5 

99.8 

111.9 

76.9 

40.5 

110.7 

128.6 

100.5 

121.4 

84.6 

64.3 

127.1 

238.1 

103.2 

107.1 

90.6 

78.6 

1,950.7 

2,159.7 
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(a) Set up a simple linear regression of Y on Ii, and determine the 
constants in the regression equation. 

(5) Set up the analysis of variance. 

(c) Make a test of significance of the usefulness of the regression equa¬ 
tion. Are there any aspects of these data which might invalidate this 
test? 

(d) Plot the data (rounded to nearest integer), and draw in the regres¬ 
sion line and the 95 per cent confidence lines for E(Y\Xi). From the 
nature of the residuals from the regression line, would you suggest any 
changes in the form of the regression equation? 

13.5. R. A. Fisher has compared the body weights (in kilograms) with 
the heart weights (in grams) of 47 female and 97 male cats. 11 The sums 
of squares and products were as follows: 



Degrees of freedom (Body) 3 

(Body X heart) 

(Heart) 2 

Females: 





Total 

47 

265.13 

1029.62 

4064.71 

Correction for mean 

1 

261.677 

1020.516 

3979.92 

Difference 

46 

3.453 

9.104 

84.79 

Males: 





Total 

97 

836.75 

3275.55 

13056.17 

Correction for mean 

1 

815.77 

3185.07 

12435.70 

Difference 

96 

20.98 

90.48 

620.47 


(a) Determine the regression of heart weight on body weight for both 
males and females. 

(■ b ) Are these two regressions different from one another? 

(c) Are the two error variances essentially the same? 

13.6. In a study of lobster population, D. B. DeLury 12 presents the 
following data on the catch per unit of effort for the time interval t, C{t), 
and the total catch up to t, K(t), in thousands of pounds: 


t 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

13 

14 

15 

16 

, 17 

C 

.82 

.75 

.94 

.80 

.83 

.89 

.70 

.58 

.64 

.55 

.52 

.45 

.45 

.49 

.45 

-48 ' 

, 43 

K 

0 

7 

13 

16 

22 

25 

32 

37 

40 

45 

50 

53 

54 

55 

57 

60 

62 


(a) A linear equation of the form C = a + bK -+- e was set up. Deter¬ 
mine the values of a and b and their standard errors. 

(b) The total population at time t — 0 is estimated by No = —a/b. 
Determine Nq. 
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13.7. Read a brief note in the December, 1948, issue of the American 
Statistician (pages 16 to 17) on the use of regression methods for business 
statistics. 

13.8. Suppose a sample of n\ is used to estimate the parameters in the 

equation Y\ = ni + (3iXi + €i and a sample of n 2 for the equation 
Y 2 — ^2 + + 62 , where <r\ is assumed to equal <r|. How would you 

test the null hypothesis that (3 1 = /3 2 ? What would you do if ^ <j\? 

13.9. (a) Show that Y and b are the ML estimates of /* and /?, respec¬ 
tively. 

(b) What is the ML estimate of a 2 ? Is this estimate unbiased? 

13.10. Use the data in Table 13.1 for incomes less than $4,000 to fit a 
new regression of net income on socioeconomic score ( Sy 2 = 623 for these 
incomes under $4,000). 

13.11. As mentioned in Example 13.1, a logarithmic relationship was 
also used. Since there were some negative incomes, Z — log (F + 1) 


Data for Exercise 13.11 


X 

SZ 

S'Z 

X 

SZ 

S'Z 

39 

.5215 

.5215 

65 

4.4110 

4.4110 

41 

.9755 

.9755 

66 

7.7002 

5.9932 

42 

3.8118 

3.8118 

67 

3.3000 

2.5018 

43 

.2217 

.2217 

68 

3.1725 

2.4361 

44 

2.2455 

2.2455 

69 

5.1685 

4.6628 

45 

13.8419 

13.8419 

70 

5.1154 

3.6538 

46 

.7811 

.7811 

71 

5.3964 

5.3964 

47 

9.0067 

8.2792 

72 

3.6718 

3.6718 

48 

22.2637 

22.2637 

73 

3.0164 

3.0164 

49 

3.3421 

3.3421 

74 

7.2008 

4.5188 

50 

12.7787 

12.7787 

75 

7.0135 

6.2004 

51 

11.1653 

11.1653 

76 

5.2177 

5.2177 

52 

9.6366 

9.6366 

77 

4.9095 

2.0622 

53 

18.1929 

18.1929 

78 

1.7054 

1.7054 

54 

16.1329 

16.1329 

79 

6.3518 

5.5146 

55 

15.4700 

15.4700 

80 

2.8142 

2.8142 

56 

14.7077 

14.7077 

81 

3.4868 

2.7624 

57 

7.9962 

7.1611 

82 

2.0645 

1.0923 

58 

9.8522 

9.8522 

83 

3.2244 

.9187 

59 

9.8317 

9.8317 

84 

4.3589 

2.6452 

60 

6.3001 

5.5959 

85 

1.1415 

1.1415 

61 

9.3233 

8.3207 

88 

1.5469 

0 

62 

3.9073 

3.1950 

89 

1.2370 

0 

63 

8.2960 

8.2960 

91 

1.0531 

0 

64 

5.6077 

3.4258 







Total 

310.4883 

282.3832 
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was used as the dependent variate with X still the fixed variate (Y in 
terms of thousands of dollars). The data for this regression analysis are 
presented in the table (the values of n and n' are not reproduced here, 
as they are the same as in Table 13.1). 

(а) Fit a new regression line using all incomes, but with Z = log (F + 1) 

as the dependent variate. (Sz 2 = 32.279.) 

(б) Do the same for the incomes under $4,000. ( Sz 2 = 22.470.) 
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CHAPTER 14 

GENERAL REGRESSION MODEL WITH r FIXED VARIATES 

14.1. Introduction. Let us suppose that we can approximate Y by 
means of the general linear equation 

r r 

^ M ^ fe + € = ¥ + y biXi + 6, (1) 

*=1 »=1 

where the regression coefficients { b 4 } are to be determined from n(> r + 1) 
simultaneous observations on Y and the X/s — X % ). The first 

relationship represents the true experimental model in terms of the 
parameters {p, and the &•) and the true error, e, while the second is in 
terms of the estimates of these parameters and the residual, e. The b ’s 
are determined by minimizing the sum of the squared residuals. 

The usual assumptions are: 

(i) The {X, } are fixed variates and may be looked upon as population 
parameters. Often the X’s are chosen deliberately and the F’s are 
produced or chosen at random. 

(ii) For a fixed set of X’s, say {X'}, the F’s associated with this set 

r 

are NID with mean E(F') = M ^ an d variance a 2 . The observed 

i— 1 

regression surface is f' = F + 26,< As mentioned in Sec. 13.1, the 
assumption of normality is required only when confidence limits and tests 
of significance are used. 

(iii) For any set of X’s, the variance of Y shall be the same; this is the 
assumption of homoscedasticity. 

Even though we are considering only one Y for each set of X’s, it is 
understood that there is an underlying normal population of Y ’s for each 
set of X’s and that the residuals from the true regression surface are 
NID (OjCT 2 ). The assumption of fixed X’s indicates that the results cannot 
be applied to the entire multivariate { Y,Xi) population—only to the sets 
of X’s used in the analysis. 

The form of the general equation should be based on some theoretical 
framework, which a research man in the particular field of application 
should be asked to furnish. In many cases the particular set of fixed 
variates may not be the ideal ones from a theoretical point of view, but 
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they may be the best substitutes for which data are available. Also, the 
form of the regression model might not be ideal, but often the ease, of 
computing from a linear model will outweigh the advantage of using a 
more exact but cumbersome model. Of course, if the nonlinear model is 
simply a multiplicative one, it can be linearized by use of a logarithmic 
transformation, provided the error is also multiplicative. 

In a graduate thesis, Monroe 1 has presented some of the uses of non¬ 
linear models for nutritional experiments, plus an extensive bibliography 
on the subject. An article on this subject by Hartley 2 outlines a method 
of using least-squares estimation for nonlinear parameters. 

The error sum of squares, which is to be minimized, is 

r 

SSE = Se 2 = S (y - £ b>x (2) 

i =1 

where we shall use S for summation over the sample values and 2 for 
summation of variates. The following general equation is obtained when 
SSE is minimized with respect to bk (k = 1,2, . . . ,r): 

biSxkXi + b 2 Sxj c X 2 + * * • + bhSxl + • * * + b r SxkX r = Sx k y. (3) 

In order to simplify the presentation which follows, let SxiXj = ciij and 
S Xi y = g i} where ay = a,-*. Then we can write the set of r equations in 
the r unknown (&»} as follows: 

dubi + ai2 ??2 + * * ' + at kbk + * * ‘ + ai r b r = 9 1 , 

0,2lbl + a 22 ?>2 -)-•••+ Cl2kbk + * ‘ + a 2rbr ~ Q2, 


Clklbl + afc 2 ?>2 + * * * 4" Ctkkbk + ' ’ * + Clkrbr 9k, 


drlbi + a r2 6 2 + * * * + ttrkbk + * * * + a rrbr ~ Qf 

These are called the normal equations. 

This system of r linear equations can be solved for the {h} by the usual 
methods of simultaneous equations given in elementary algebra. Methods 
of solution have been presented by Snedecor 3 and in a special computing 
manual by Wallace and Snedecor. 4 However, we believe that, in general, 
it is better to solve for the Vs by use of a method called matrix inversion . 
Computing techniques have been devised so that they can be followed 
without a knowledge of matrix theory. A detailed discussion of these 
techniques and the necessary matrix theory are presented by Dwyer. 5 
We shall present two computing techniques in Chap. 15. 

In order to determine the V s by a method of matrix inversion, an 
intermediate computing step is necessary. A new set of r 2 constants 
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[cij] must be determined. Suppose we arrange the a’s and c’s in rows 
and columns as follows: 



a n 

an ‘ 

• air 


Cll 

Cl2 

Clr 

A = 

«21 

«22 • 

Cl2r 

, C = 

C 21 

C 22 

C 2 r 


_a r i 

a r 2 * ■ 

■ * a rr _ 


__Cn 

Cr2 

Crr_ 


These arrangements are called matrices, and the individual a’s and c’s are 
called elements. The c’s must satisfy the following conditions: 


•=i 


ttijCjk 


i = Jc. 
i Jc. 


(5) 


In other words, the sum of the products of corresponding elements in a 
row of A and a column of C must be unity for the same row and column 
(for example, the first row of A and the first column of C) and must be 
zero for an unlike row and column (for example, the first row of A and the 
second column of C). The C matrix is called the inverse of the A matrix. 
The computing techniques mentioned above refer to the computations 
required to determine the c’s. 

After the c’s have been determined, the solution for b k is as follows: 


bk = ^ Ckj-gj = Y CkjSx/y, Jc = 1, 2, . . . , r. (6) 

y=i 3=i 

In other words, any bk can be computed by adding up the products of the 
elements in the Jc th row of C times the corresponding g’s. A brief dis¬ 
cussion of this matrix theory is presented in Sec. 14.2 for those readers 
who desire a more theoretical presentation. 

From the model equation (1), we see that 

y = Y - Y = 2/3 tXi + e - €, 

where 1 = Se/n. Hence using the definition SxjX { = a H and equation (5), 

^ C V [&>* (T te + e - e)] = T (T CiA'i) ft + '^Ctji.SXje) 

j 1 i j j 

— fik + ^ Ckj(SXj €). 

Since the {e} are NID(0,cr 2 ), 

E(b k ) = J3k (indicating b k is an unbiased estimate of (3 k ); 
o- 2 (b k ) = E[(bk — (3 k ) 2 ] = E[('Ec k jSx j e) 2 ] = c k k<y 2 ; 
vibibk) ~ Ci k O’^“, 

0-2 ~ bk) = ( Cu — 2 Cik + Ck k )x 2 . 
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In order to prove that a 2 (b k ) = c kk cr 2 , we make use of the linear-form 

techniques of Part I. ^ c kj Sxj€ s= l can be represented as a linear 

j 

function of the n e’s, that is, 

n 

l = S w p e p , 

p-i 

where w p = ^ and is the pth observation on Xj. Since the e are 

3 

NID(0,cr 2 ), cr 2 (Z) = (Aw 2 )o- 2 . But we can write 


since 


Sw* 


r r 

$ ^ ^ C'kjXjp^ ^ ^ Ckj’Xj’j^ J CkjCkj'&j'j ~ Ckk} 

P 3=1 /=! 3 3* 


l 




1 

0 


when k = j. 
when k ^ j. 


A similar proof can be set up for a(bib k ). 

It should be noted that since the e are assumed NID(0,tr 2 ), the {bi} are 
multivariate normally distributed with means {ft}, variances Caa 2 } and 
covariances c^o- 2 . 

The error sum of squares, as given in equation (2), is 


SSE = S (y - £ te) 2 

r r r 

= Sty 2 — ( ^ biSxiy^j + ^ h ( ^ b k SxiX k — Sxjy) 

i =1 i~l k~l 

r 

■= Sy 2 - ( ^ biSx^y 

i — 1 

because the values in parentheses of the second equation are simply the 

r 

normal equations, where ^ b k Sx{X k = Hence the reduction due to 

regression is 

r 

SSR = ^ biSxiy s R 2 Sy 2 , 


where R is called the multiple correlation coefficient. 




172 


LEAST-SQUARES ANALYSIS 


In Sec. 14.3, we shall prove thatf 

F(SSE) = (n - r - l)c r 2 , s 2 = SSE/(n - r - 1), 

E( SSR) = rcr 2 when all = 0 . 

Hence F = SSR/rs 2 can be used to test the null hypothesis that all & = 0 . 
F has (r, n — r — 1 ) degrees of freedom. Also, SSR > n r 2 when some 
& ^ 0 . 

If it is desired to know if the last (r — k) of the r fixed variates made a 
significant contribution to SSR, we can obtain the reduction due to the 
first k fixed variates by using 

f = V + bix i + • * * + V k x k . 

This reduction will be called SSR*. Then the added reduction due to the 
last (r — k) fixed variates is (SSR — SSR*). The expected value of 
(SSR — SSR*) is a function of only the last (r — k) /3 ? s. Hence we can 
test the null hypothesis that each of these (r — k) p’s vanishes without 
saying anything about the first k fixed variates. The analysis of variance 
is: 


Source of variation 

Degrees of 
freedom 

Mean square 

E( MS) 

First k fixed variates 

k 



Added reduction by last 




(r — k) fixed variates 

r - k 

(SSR - SSR*)/(r - k ) 

. . . ,(Mt 

Error 

n — r — 1 

s 2 

(T 2 


f 6 is a function of only {Pk+i, Pk+ 2 , . . . , /3 r {. 


Under the null hypothesis H Q : {fi Jc+ 1 = ^*+2 = • • • = (3 r = 0}, 0 = 0. 
Hence 

SSR - SSR* 
s 2 (r — k) ’ 

with (r — k) and (n — r — 1) degrees of freedom. 

Exercises 14.1 to 14.10 pertain to Sec. 14.1. 

14.2. Matrix Algebra. { We shall digress here in order that the reader 
who is unfamiliar with the methods of matrix algebra may become 
acquainted with the necessary concepts and techniques used in simplify¬ 
ing the theory and computations of regression. 

f s 2 is denoted as s l. x in most discussions of regression analysis. 

J The reader may omit this section if he does not want a more theoretical discussion 
than Sec. 14.1. 
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A matrix is an array of quantities and may be represented as follows: 


an 

ai2 

&ln 

&21 

&22 

* a 2n 


®m2 

a m n_ 


The quantities a l3 are called the elements of the matrix. This is a matrix 
of m rows and „ columns. If m = », the matrix is called a square matrix. 
I he number of rows or columns of the square matrix is called the order of 
t e matrix. If a„ = a jit the matrix is called a symmetric matrix. Two 
matrices are equal if and only if corresponding elements are identical. 
I or the most part we shall be concerned in regression analyses with square 
symmetric matrices. 

Two matrices are added in the following manner: 


an ai2 
&21 $22 


^11 £>12 
b 2 l b 22 


a u + bn a u + 612 

_«21 + 621 «22 + b 2 2 


Subtraction is defined in a similar fashion. 

In multiplying two matrices, the elements of the product matrix are 
obtained by multiplying the elements of a row of the first matrix by the 
corresponding elements of a column of the second matrix and adding 
these product terms. For example, 


an &i 2 


bn b 12 


aiibn + a X2 b 2 i aiibn + ai 2 b 22 

_a 2 i a 2 2 _ 


_b 2 1 b 22 _ 


_ a 2 i bn + a 22 b 2 i a 2 ib n + a 22 ?> 2 2 


If we let A stand for the first matrix on the left above and B the second 
it is obvious that A • B may not necessarily be equal to B • A ’ 

Division of one matrix by another is defined as an inverse operation of 
multiplication. Let 


A • B = G. 


Then, multiplying the equation on the left by (the inverse of , 4 ) we 

find n 

A - 1 • A ■ B = A-H1. ( 7 ) 

Now, A~ l is defined to be a matrix such that 
A - 1 ■ A = A • A~ l = I, 

where I is called the identity matrix and plays the same role in matrix 
algebra that 1 plays in ordinary algebra. The identity matrix I in terms 
of its elements is 
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1 

0 • 

• 0 

I = 

0 

1 • 

• 0 


_0 

0 • 

• • 1 _ 


Returning to equation (7), we see that 

B = A~ l G. 

We can now define the operation of dividing matrix G by matrix A as 
yielding the matrix B obtained by multiplying matrix G on the left by 
the inverse matrix of A. 

In order to explain a general method of obtaining the inverse of a 
matrix, we need to recall the definition and basic properties of determinants. 
The elements of a square matrix determine the determinant of the matrix. 
A determinant is a function of the elements of a square array. It may be 
expressed as a polynomial by expanding the determinant by minors 
according to Cramer’s rule. 

The order of a determinant is its number of rows or columns. A minor 
of an element of a determinant is the determinant of one less order found 
by striking out the row and column containing the particular element. 
The cofactor Ay of the element a yj is obtained by multiplying (-1)^' by 
the minor of ay. Cramer’s rule permits us to evaluate a determinant by 
multiplying the elements of t any row or column by the corresponding 
cofactors and summing these products. > 

The element a* of the inverse A~ l of the matrix A is obtained by divid¬ 
ing the cofactor A# by the determinant \A\ of the matrix, that is, 


Example 14.1. In order to find the inverse A" 1 of the matrix 


A = 


0 3 2 

1 2 4 

L3 0 2 


we evaluate the determinant of this matrix: 


|A| = 

= 0 — 3(2 


0 3 2 


2 4 


1 4! 

+ 2 

1 2 

1 2 4 
3 0 2 

= 0 

0 2 

1 ~~ 3 

3 2 

3 0 


12) + 2(0 - 6) = 18. 
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The cofactors are 


and so forth. 


■An = (-i) i+t l 

Al2 = (-1) 1+2 3 



We find the complete matrix of the cofactors to be 


A ij 


4 10 -6 

-6 -6 9 
8 2-3 


Upon interchanging rows and columns and dividing each element by \A\, 
we obtain the inverse matrix 




4 

T¥ 

_ 6 

1 8 

8 

TF 


A- 1 = 


1 0 

T¥ 

6 

TF 

2 

TF 



— 

TF 

9 

TF 

3 

TFJ 


We may verify numerically that 









"1 0 

• o" 

A ■ A- 1 — A 

[- 1 

•A ■- 

= I = 

0 1 

0 





.0 0 

1 


For example, the first element of the I matrix is obtained as 

0 * (xs) + 3 • (xt) + 2 • ( — x 6 -g-) = 1. 

Some of the results of this section will now be used to simplify and 
develop the theory of Sec. 14.1. The normal equations of Sec. 14.1 may 
be written in matrix form as 


where 


and 


A • B — G, 



Sx\ 

Sx\x 2 • 

* SXiX r 

A = 

Sx ix 2 

• 

• Sx 2 x r 


__Sx \x r 

Sx 2 X r * ' 

' * Sx? 


&i 

b 2 

b r . 


G = 


Sxiy 

Sx 2 y 


Sx r y, 


(4'J 
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The matrix A is a square symmetric matrix, while the matrices B and G 
are single-column matrices. 

Upon multiplying equation (U) on the left by A \ we find 

A- 1 • A • B = A~ l G, 

B = A- l G. 


A- 1 - C 


Cn C 12 * 

C21 C22 


C r l C r 2 


C11 

C12 ' ' 

' * Cir 

Sxiy 

C21 

C22 

‘ ’ c 2r 

Sx 2 y 

Cfc 1 

Ck2 

' * Ckr 

Sx k y 

Crl 

Cr 2 

• • c rr _ 

_Sx r y . 


Hence, since two matrices are equal if and only if corresponding elements 
are equal, we see that 

b k = CkiSxiy + c k 2 Sx 2 y + * * * 4* c kr Sx r y 
or 


r 

bk = ^ CkjSxjy , 


which is the same as the results of equation (6). 

Instead of inverting the matrix A directly, as illustrated earlier, in 
order to obtain the elements c# of C — A" 1 , it is more convenient to pro¬ 
ceed indirectly. We know that 

A • A" 1 = I 


A-C = I. 

Writing the elements in for each matrix, we have 

• • • 1S.T1.7kl rCi 1 Cl 9 ’ * ’ Cl, 


Sx\ Sxix 

Sx 1X2 Sx\ 


Sx\X r C11 C12 • • • Cir 

Sx 2 X r C 2 i C 2 2 * * ‘ C 2r 


\_SXiX r Sx 2 X r ' * ‘ SX* J C r 2 * C ™A U J 

Again, since two matrices are equal if and only if the corresponding ele¬ 
ments are equal, we can immediately write down r sets of equations. For 
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example, the first set may be obtained by multiplying all the rows of the 
A. matrix by the first column of the 0 matrix and setting these sums of 
products in turn equal to the elements of the first column of the identity 
matrix. We obtain 

Cii/ST 2 T C21$#l#2 + • * * + C r iSXiX r = 1, 

C i \S X jX 2 + C2lSxl + * * * + C r iSx& r = 0 , 

CnSXiXr + C 2 lSx&r + * ' * + C r lSx% — 0. 

In a similar fashion the other (r - 1) sets of equations may be obtained. 

The solution of these sets of equations enables us to find the elements c# of 
the inverse matrix C = A~ l > Short methods for obtaining simultaneous 
solutions to all r sets of equations will be presented in Chap. 15. It should 
be noted that c {j = c Si because of the symmetry of matrix A. 

14.3. Theory of Tests of Significance, t For references to the theory 
of tests of significance with regression problems see Bartlett , 6 Yates, 7 and 
It. A. Fisher . 8 

The equation Y = n + Sfta* + e == f + 26#,- + e can be replaced by 
a new equation 

Y = it + Sft-Zi + e = Y + tb'^ + e, 

where {«,•} are functions of U'.j so constructed that \z,\ are completely 
orthogonal variates. As before, the « are NID(0,<r 2 ). Hence the Y’a are 
NIDO + S/Slz,-, <r 2 ). We might write 

Zi = WnXi, 

Z 2 = W21X1 + ^ 22 ^ 2 , 


= W r iX i + W r2 X 2 + * * * + W rr %r, 

where Sz] = 1 and S(z# 3 ) = 0 (i ^ j). Hence 

Y — fji + 2j8^ + e ss n + 2/3fa + e, 

where [xi\ are solved backward in terms of {z^. The least-squares esti¬ 
mate of Si is 6' = Sziy = SziY, because of the orthogonal relationships. 
Also, 

r 

SSE = S(y - Sb'z,-) 2 = Sy* - £ W) 2 - 

t = 1 

Hence 2(6J) 2 is the reduction in sum of squares due to the regression. 

t This section can be omitted if the reader is not interested in a more theoretical 
presentation than in Sec. 14.1. „ 
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Before continuing, we should show that our solution to the least-squares 
equations is unique. We know that 

^ ^ ^ ^ K z i = Y + ^ b"x i} 

i = 1 * = 1 

r 

where b'/ = ^ w^b'-. For example, 
y=i 

b[ f = Wnb[ + ^ 21^2 + * * * + w r ib[. . 

But b'/ = b { , because both were derived by minimizing Se 2 . There can 
be but one minimum because the Se 2 equation is quadratic in the b’ s. 
Hence 

2 (6') 2 = 26^. 

We shall use the orthogonalized regression coefficients {6'j in the theory 
which follows, always remembering that any x d is a function of only 

z \) , Zj. 

Because of the relationship 

sz*, = f ? * v j : 

l 1 * = h 

V r 

SziiY - M - l jS>y) = Sz* = Sz/Y - y 0'SziZj = ft - ft. 

Hence 

(6' - ft) = Sz#, 

a completely orthogonal form in the e, which are NID(0,<r 2 ). From our 
theory on orthogonal forms, we know that 

- ft') = SZiE(e) =0 or E{b’/) = ft, 

cftft) = £[(ft - ft) 2 ] = &?<r 2 = o' 2 , 

w - /3')(&; - m = 0, f ^ j. 

Since the 6 are NID, so are (ft - ft'). Also, t = (ft - ft)/«, where 
®(s 2 ) = c 2 . Or in terms of the original variates I A’, } 

t = 0>i- M/s ^ w%, i = 1,2, ... , r . 

Next we need to determine s 2 . 

SSE = s(y-l ft*,) 2 -S[y-l ft,, - ? (ft - ft),.-] 2 

= s(y- JftW) 2 - J (ft - ft) 2 = S( 6 - J) 2 - J (ft - ft) 2 . 

2£(SSE) = (n - l)(x 2 - ra 2 = (n - r - l)a 2 . 

Hence if we let s 2 = SSE/(n - r - 1), E(s 2 ) = o- 2 . 
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If it is desired to make a test of the hypothesis {ft = 0}, it would be 
useful to be able to use the F ratio as a test criterion. In order to use F, 
it is necessary to find two quantities distributed as x 2 cft such that the 
ratio of the two will test the hypothesis {ft = 0} against the alternative 
{ft ^ 0}. It would appear from the above that SSE is distributed as 
xV 2 with (n — r - 1) degrees of freedom. A more rigorous method of 
proof is the following: 

Augment the existing set of r completely orthogonal variates {s*} by 
(n — r — 1) others, which we shall designate as {pj } (j = 1,2, , 

n _ r _ i). The estimation equation will then be 

r 

Y = M + ^ 0 'iZi + e = Y + ^ Kzi + 2 CjPi, 

?; = i i i 

where cj is the regression coefficient for p ; {E(2c'py) = 0], Note that with 
these n orthogonal variates, there is no residual sum of squares. Hence 

S[y - Sb'** - Sc' Pi ] 2 = 0. 

0 = S[y - 2ft« ; - 2(ft - ft)«< - HVi? 

= s [ y ~ X^] 2 - 2 _ 


y (c') 2 = S(e - e) 2 - 2 (bj - A') 2 - SSE. 

j=\ i=l 

Hence we have broken SSE into (n - r - 1) orthogonal squares. Fur¬ 
thermore, 

cj = S(ypj) = S(epj) + ^ft&feft), 

i 

E [ft - £ ft«S(ftp,) ] = o, £(ft) = £ ft-SfeP;) = 0. 

<r 2 (ft) = EK&p,') 2 ] = V or (c'4) = 0, k*j. 

<r( c ;6') = #[ft(ft - ft')] = £[S(«p,)S(«*)] = o. 

Since the {cj} are orthogonal linear forms in NID(0,o- 2 ) variates, they 
are NID(0,<r 2 ). Hence the (cj) 2 are independently distributed as x 2 ^ 2 
with 1 degree of freedom each, or SSE is so distributed with (n - r - 1) 
degrees of freedom, f 

We have already shown that the (ft — ft-) are NID(0,cr 2 ); hence, 
(ft _ ft) 2 i s distributed as x 2 ^ 2 with 1 degree of freedom, and 2 (ft) 2 will 
be distributed as xV 2 with r degrees of freedom, under the null hypothesis 
{ft = 0}. t Hence 

f Since {cj} and {&j} are independent of one another, SSE and SSE are independent 
of one another. This is a necessary condition for the statements concerning F, 
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F = s (ft) 2 / SSE 

r / n — r — 1 


can be used to test the null hypothesis. Also, we see that the single-tailed 
F test should be used if the alternative hypothesis is {/3- ^ 0},because 
the expected value of the denominator is still cr 2 while the expected value 
of the numerator is 


^[(b') 2 i/r = 2 wft - ft') 2 ] + mm - ft)] + mm}/r 

i i 

= 2 [<r 2 + (ft) 2 ]/?- = <r s + 2 (ft) 2 A > <r 2 . 

* i 

There are many conditions under which it would be desirable to assume 
several of the & ^ 0. For a general test, we shall assume {£', j8' 2 , . . . 

&k ^ 0} and test the null hypothesis that {/3* +1 , . . . ? /3' = 0}. 

We have shown that SSE, found by fitting for i = 1, 2, . . . , r, 
is independent of the hypotheses for {#} and'that the (b' ~ ft') 2 are 
independently distributed as x 2 ^ 2 with 1 degree of freedom each. Hence 
under the null hypothesis that {/3j. +1 , . . . , /3' = 0}, 


I (ft ) 2 

i = k+l 

is distributed as x 2 # 2 with (r — k) degrees of freedom. We know that the 
reduction in the residual sum of squares due to fitting r { b ^} is given by 


r 



and that due to fitting the first h {b ■} is 


k 



Hence 


2 (ft ) 2 

f = i 

is the additional reduction in sum of squares gained by using {6J. +1 , . . . } b' r ( 
in the regression equation after first using {6', It should be 

emphasized that, by use of the orthogonal transformation, every b[ can be 
computed independently of all the others. Hence the reduction due to 
any one of the regression coefficients is independent of whether any others 


.ft 
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have been used and is independent of any assumptions made about the 
expected values of these others. 

The major difficulty facing us at this stage is to transfer these results on 
the additional reduction due to the last (r — k) fixed variates back to the 
original {6*,ft} setup, where the value of any regression coefficient depends 
upon all the others included in the analysis. For example, the values of 
{bi,b 2 , . . . ,h} as estimated from all the r {xi} will not be the same as 
the {bi'jb", . . . ,b' k '} as estimated from the first A; {xi}. In order to use 
our orthogonal results, the following conditions must hold. 

(1) If we let R r be the reduction in the residual sum of squares due to 
the first r and Rk that due to the first k {xi}, then (R r Rk) must 
be distributed as with (r - k) degrees of freedom under the null 

hypothesis {ft+i, ft+ 2 , . • • , ft = OJ. 

(2) The residual sum of squares Sy 2 - R r is independently distributed 
as %V 2 with (n - r - 1) degrees of freedom. 

We have shown that 

2 w - #)* 

i = 1 

is distributed as x 2 ^ 2 with k degrees of freedom, 

t m - «) 2 
1 

as xV 2 with (r - k) degrees of freedom, and 

SSE = S(e - ?) 2 - Y (ft - ft') 2 = Sy 1 - 2 W 

/d i id\ 

as x 2 ^ 2 with in — r — 1) degrees of freedom—all independent of one 
another. 

But we have also shown that both methods of deriving the regression 
coefficients minimize Se 2 and hence produce the same SSE. Hence 

Sy 2 - R r = SSE 

is distributed as X V with (n - r - 1) degrees of freedom. Also 



If we solve backward for the (.^} in terms of the {zi}, we find that 
Xi = w'nZi -f w' i 2 z 2 -F ‘ * * + w&i ? 
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where w# are functions of the w i3 . For example, 

W'u = 1/Wn, w' n = ”(^ 21 /^ 11 ^ 22 ), w' 22 = l/w 22 . 

Since 2/3 3 x 3 = S/3and by equating the coefficients of we find that 

r 

^ 

This shows that the null hypothesis . . . , (3 r = 0} means that 

{Pk+ 1 , • • • , & — 0} because the/3' are functions of only {(3 j} (3j +1} . . . ,/3 r }. 

r 

Hence ^ (5') 2 is distributed as xV 2 with (r — h) degrees of freedom 

i=Yc +1 

under the null hypothesis for the {&•)• 

Now R k is the reduction due to the first h {«*}. But the first h {zi} are 
functions of only the first Jc { Xi] . For example, 

k 

Zk = ) WkjXj. 

y = i 

Again the reduction due to using the {x^} or {zi\ must be the same, so that 

h f 

Rk = ^ m) 2 - But since R r = ^ ( b \) 2 , the added reduction after the 

i = 1 i = 1 

first h {x^} is given by 

r 

Rr~ Rk= I (ft') 8 - 

i =T+ 1 

It might be added that the { 2 *} behave just like orthogonal polynomials, 
which will be discussed in Chap. 16. 

14.4. The Regression Problem When Certain Assumptions Are 
Relaxed. Suppose we relax the assumptions of normality and homo- 
scedasticity in the previous sections of this chapter. For this case, David 
and Neyman 9 give a proof of the following theorem on least squares. 
Given 

(i) Fi, Y 2 , . . . , Y n are independent. 

r 

(ii) E(Y 3 ) = ^ fiiXij and the X’s are fixed variates, f 

i — 1 

(iii) Out of the n equations in (ii), it is possible to select at least one 
system of r equations soluble with respect to the (3’s. 

(iv) The variances of the Y 3 satisfy the relationship 

a) = a 2 /w h j = 1, 2, . . . , n, 
f If it is desired to use the intercept a. in the equation, set Xi / ss 1. 
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where a 2 may be unknown but the {w 3 -} are known positive constants > 0. 
Then 

(a) The best linear estimate of Y is 

f = 

where the b ’s are obtained by minimizing the weighted sum of squares 


SSE tt 


r 

= sv>j (n - ^ v) 


with respect to the bi. The typical least-squares equation is 


<3SSE„ 

db k 


0: biSwXiXk -f- 


+ b k SwX% + 


(b) The estimate of a 2 (Y) is given by 


4“ b r SwX r X]c 

- SwX k Y. 


where 


s*(t) = 


~ AqAi 
(n — r) A 2 ’ 



Ho 

H i 

••• Hr 

> 

o 

II 


Gn 

* * * Gir 


\H r 

G r l 

* • * G rr 



Gn 

• ’ Gir 


G r l 

* ‘ Grr 


\ 

0 

X x • • • 

V 

II 

< 

x 1 

Gn • * • 

Gir 


X r 

Grl * * * 

Grr 

Ho 

= 

SwY*, 


Hi 

= 

SwXiY, 


Gn‘ 

- = 

SwXiX*. 



? 


If r = 2, with 0i = a and 0 2 = 0, we have 

= a -f- bX = F w -f- b(X X w ), 


where Xi = 1, X 2 = X, Y w = SwY / Sw 7 and X w — SwX / Sw. Also, 
b = Swxy/Swx 2 , where x = X — X w and y — Y — Y w , and X w and Y w 
are called weighted means and b a weighted regression coefficient. 
Often the weights are adjusted so that Sw = 1, 

Example 14.2. The following data are taken from an experiment on 
soybeans with three nitrogen treatments plus a check treatment (no 
nitrogen) to find out the relation between the mean yield of soybeans 
per acre (F) and amount of nitrogen in pounds per acre (X): 


Lb N per acre (X) 

o 

47 

94 

157 

Bu beans per acre (F) 

14.7 

14.6 

17.8 

22.1 

No. of plots 

14 

7 

7 

7 

w 

2 

1 

1 

1 










184 


LEAST-SQUARES ANALYSIS 


Since 14 plots were used for the 0 level of nitrogen and 7 plots for the 
three treatments, the variance of the mean yield for the 0 nitrogen plots 
was one-half that of the mean yields for the three treatments; so weights 
of (2,1,1,1) were used in estimating the regression of mean yield on 
amount of nitrogen per acre. 

The calculations needed were 

X w = 59.60, Y w = 16.78, 

Swx 2 = 17,933.20, Swxy = 828.66, Swy 2 = 42.75, 
b = 0.0462, 

t = 16.78 + 0.0462 (X - 59.60). 

In this case the estimate of variance was taken from the 13 + 3 (6) =31 
degrees of freedom within treatments: s 2 = 7.56. Hence the estimated 
variance of each of the E’s, which were means of 7 w plots, was 

7.56 = 1.08 
7 w w 

See Exercise 14.12 for another estimate of the variance, using these 31 
degrees of freedom plus the 2 degrees of freedom for deviations from 
regression. 


Table 14.1 


Estimates of cr 2 (F) for Example 13.If 


Range of X 


All 

incomes 



Income 

A 

m 

© 

o 

o 

X 

Sn 

s 2 (F) 

w 

X 

Sn 

§2 (F) 

w 

1 

39-44 

42.3 

36 

.366 

2.732 

42.3 

36 

.366 

2.732 

2 

45-46 

45.1 

54 

.457 

2.188 

45.1 

54 

.457 

2.188 

3 

47-48 

47.7 

120 

.431 

2.320 

47.7 

119 

.332 

3.012 

4 

49-50 

49.9 

68 

.314 

3.185 

49.9 

68 

.314 

3.185 

5 

51-52 

51.4 

70 

.545 

1.835 

51.4 

70 

.545 

1.835 

6 

53-54 

53.4 

108 

.587 

1.704 

53.4 

108 

.587 

1.704 

7 

55-56 

55.4 

91 

.435 

2.299 

55.4 

91 

.435 

2.299 

8 

57-60 

58.4 

95 

1.111 

0.900 

58.4 

93 

.837 

1.195 

9 

61-64 

62.3 

73 

1.807 

0.553 

62.3 

68 

.678 

1.475 

10 

65-70 

67.4 

68 

1.845 

0.542 

67.3 

61 

.623 

1.605 

11 

71-76 

73.6 

67 

2.727 

0.367 

73.5 

63 

1.063 

0.941 

12 

77-91 

80.8 

59 

8.254 

0.121 

80.1 

45 

.894 

1.119 

Total 

909 

876 


f cr 2 (F) measures the variation of net farm incomes about their means. Sn is 
the number of farmers for each class (summing the values of n in Table 13.1). The 
number of degrees of freedom for each s 2 (F) is Sn less the number of X ? s for the class 
(total of 860 and 830, respectively), w — l/s 2 (F). 
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Example 14.3. It appeared from the graph of the data of Example 13.1 
(Fig. 13.3) that the variability about the regression line tended to increase 
with increasing X. In order to investigate this matter, the data were 
grouped into 12 classes, and a separate estimate of variance was computed 
for each group. This was done both for all the income data and foi that 
portion of the data after eliminating all incomes of over $4,000. These 
estimates of variance are presented in Table 14.1. 

In Table 14.1, s 2 (Y) was computed by adding together 

Sy 2 = S(Y - Y) 2 

for each X in a class and dividing this sum by the total degrees of freedom 
for the group. For example, with the first group (.X from 39 to 44), 
we had the following distribution: 


X 

n 

d.f. 

Sy 2 

s 2 

39 

3 

2 

1.213 

.607 

41 

3 

2 

.341 

.170 

42 

18 

17 

8.325 

.490 

43 

1 

0 

0 


44 

11 

10 

1.475 

.148 

Total 

36 

31 

11.354 

~366 


The data were grouped like this in order to smooth out the estimates of 
cr 2 (F) by increasing the number of degrees of freedom for each estimate. 

A weighted regression of Y on X can then be computed using as weights 
the values of w given in Table 14.1; for example, any observation with 
X to 39 to 44 will receive a weight of 2.73. These weights are applied 
to the data presented in Table 13.1. Since the same weights are used 
for many observations, a short-cut computing procedure can be used as 
follows: 

(i) Compute for each class: S(nX), S(SY), S(nX 2 ), $[X($F)], and 
S[(SY) 2 /n]. It might be easier to compute Y for each class and make 
the last computation as S[Y(SY)]. 

(ii) Multiply each sum in (i) by the corresponding weight, w, in Table 
14.1, and sum these weighted values over the 12 classes, for example, 
S(w) = S[w(Sn)], S(wX) = S[w(SnX)], S(wX 2 ) = £K£nX 2 )]. 

(iii) Adjust the sums in (ii) for the means, for example, 

S(wx*) = S(«X>) - ( -^f- 

(iv) b = S(wxy) /S(wx 2 ); SSR = ( Swxy) 2 /S(wx 2 ). 
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There is some question as to the proper error term to use in this case, 
because the weights were derived from the data. The simplest procedure 
is to use 


SSE = S(wy 2 ) — SSR, 

with N ~ 2 degrees of freedom (N = 909 or 876). In this case 
S(«n/ 2 ) = (n - m) + S[u>S(f,Sr)] - 

b{w) 

where m is the number of X’s in Table 13.1 (m = 49 or 46) and the sums 
are taken from step (ii) above. In this case s 2 = SSE/(iV - 2), and 
F = SSR/s 2 can be used to test for the significance of the regression of 
Y on X. In Sec. 25.2, we shall consider another estimate of the error 
variance. 

The sums in (i) for each class and the weighted sums (ii) over all 
classes are presented in Table 14.2. 


Table 14.2 


Class 

S(nX) 

S(SY) 

S(nX 2 ) 

S(XSY) 

S(YSY) 

w 

1 

1,523 

26.2 

64,503 

1,106 

19.7 

2.732 

2 

2,433 

52.5 

109,623 

2,365 

51.1 

2.188 

3 

5,726 

110.2 

273,250 

5,258 

101.2 

2.320 

4 

3,383 

55.0 

168,317 

2,739 

45.2 

3.185 

5 

3,601 

76.2 

185,263 

3,922 

83.3 

1.835 

6 

5,769 

129.7 

308,187 

6,939 

159.9 

1.704 

7 

5,044 

113.1 

279,604 

6,278 

144.6 

2.299 

8 

5,546 

139.3 

323,872 

8,139 

206.6 

.900 

9 

4,548 

118.9 

283,438 

7,423 

204.1 

.553 

10 

4,580 

130.4 

308,690 

8,796 

252.3 

.542 

11 

4,930 

152.8 

362,964 

11,270 

380.7 

.367 

12 

4,769 

211.7 

386,097 

17,399 

1,112.6 

.121 

Sumf 

73,953 

1,581.6 

3,889,486 

85,013 

2,012.0 



f Weighted by weights, w. S(wSn ) = 1,427.63. 


The sums adjusted for the mean are 

S(wx 2 ) = 3,889,486 - 3,830,857 - 58,629, 
S(wxy) = 85,013 - 81,929 = 3,084, 

S(wy*) = 860 + 2,012 - 1,752 = 1,120. 

Hence 

b = S(wxy)/S(wx 2 ) = 0.05260, 

SSR = (Swxy) 2 /S(wx 2 ) = 162, 

SSE = S(wy 2 ) — SSR = 958, 
s 2 = SSE/907 = 1.056, 
f = 1.108 + 0.0526(X - 51.80). 
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EXERCISES f 

14.1. Shew that the variance of Y is cr 2 /n and that Y is independent of 
each of the ft. 

14.2. (a) If we use the model, f = a + SftX*, show that 

a = f - SftX,. 

(b) What is the variance of a? 

14.3. What changes should be made in the regression analysis if 


e(y) = y 


ftV? 


14.4. Given the regression model Y = fx + ^ ft#* + e = ¥ -f- e. 

i — 1 

Show that 

(a) efts uncorrelated with any of the Xi, that is, Sx# = 0 (for any i). 

(b) s¥e = 0 . 

(c) £e 2 = SSE = SY 2 - S¥ 2 . 

14.5. How could you test the hypothesis that ft = ft? 

14.6. Set up the model and normal equations for r — 2. 

(a) Show that 

cu = Sxl/DSxl, cu = c 2 i == — Sxix 2 /DSxI, c 22 = 1/7), 


where D — &r| -- (SxiX 2 ) 2 /Sx\. 

(b) Show that 

ft = ft ft- CnSfte ft- Ci 2 Sx 2 e. 


Set up the same equation for ft. Using these results, prove algebraically 
that 

<r 2 (ft) = Cncr 2 , cr 2 (ft) : = C 22 c r 2 , ftftft) — Ci 2 (T 2 . 


(c) Prove that bi and ft can be estimated in a two-stage estimation 
procedure: 

(i) y = b[x i + eij » 2 - b x xi + e x , 

(ii) ci = fte^ + e 2 = ft (#2 — b x xi) + e 2 . 

Determine bi, b x , and b 2 by least squares, and show that 


and 


y 


b[xi + 



SXlX 2 \ 

~sx\ x V 


ft" C 2 


hi 


b[ - ft 


Sx ix 2 


f 'Exercise 14.6 should be worked by everyone. 
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Hence the regression of y on xi and x 2 can be regarded as the regression 
on xi (with regression coefficient b[) and on x 2 adjusted for x\ (with regres¬ 
sion coefficient b 2 ), where x 2 — (Sxix 2 /Sxf)xi is the value of x 2 adjusted 
for x\. 

(d) Using the results in (c), show that 


SSR = 

(e) Finally show that 


(Sxxy) 2 ( Se x y ) 2 

Sxl ^ Sel 


= 4 + 4 


Sxl + <4 E{z\) = D0* + a 2 , 

X(SSR) = p 2 Sx\ + 2{3i{3 2 Sxix 2 + p 2 Sx 2 + 2a 2 , X(SSE) = (n - 3)a 2 . 

Set up the analysis-of-variance table to summarize these results. 

(/) How would you test the null hypothesis that 0 2 = 0? 0i = /3 2 = 0? 
{g) Can you set up a two stage estimating procedure to test the null 
hypothesis that pi = 0? How would you fit these results into the analy¬ 
sis-of-variance table? 

( h ) What are the variances of 8' and e' for predicting F from X\ = X x , 

X 2 = X'? 

14.7. Given Y = y + p&i + p 2 x 2 + e, but we estimate only 

f = Y + b[x i. 

Show that b[ is a biased estimate of Pi and that s 2 is also a biased estimate 
of O' 2 . 

14.8. Given Y = y + p[x i + e, but we estimate with 

f — Y + biXi + b 2 x 2 . 

Determine whether or not b i and s 2 are unbiased estimates of p[ and a 2 , 
respectively. 

14.9. In the analysis of the data given in Exercise 13.4, it was thought 
advisable to use as a second fixed variate (X 2 ) the value of F for the pre¬ 
vious year: 77.4, 87.4, 97.6, . . . , 110.7. Use the results in Exercise 
14.6 to: 

(a) Estimate the two regression coefficients and their standard errors. 
(5) Set up the complete analysis of variance, as in (e) and (g). 

(c) Estimate the average F for X' x = 100 and X' 2 = 100 and the 
standard error of this estimate. 

14.10. Show that the normal equations are also the maximum-likelihood 
equations for estimating the p’s. 

14.11. Use the orthogonal transformations of Sec. 14.3 for r = 5. 
Note that a 2 (b') = a 2 and a(b'b') = 0. Show that 


E{z\) 


Pi + P 2 


SxiX2 

Sxl 
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(а) 6 b = ^ 5,565 and 64 = ^ 4,464 + 1 ^ 5 , 465 . 

( б ) The added reduction in residual sum of squares due to the last two 
fixed variates (X 4 and X 5 ) over the reduction due to the first three fixed 
variates (Xi, X 2 , and X 3 ) alone is given by 

^ 5,564 ~ 5 , 4 ^ 5,56465 -j- (wl A -f- w 2 t4 )b 2 _ 05,564 — 204,56465 -j~ <^ 4,465 

W 4,4 W l,b 04,405,5 “ C 4 5 

14.12. (a) Show that, for r = 2 , 

SSE„ = S-wy* - -^|r; ^ = SSE„/(n - 2). 

What would be the value of SSE^ for Example 14.2? 

( 6 ) Also show that a 2 (Y w ) = a 2 /Sw, a 2 (b) = a 2 /Swx 2 , <r 2 (F) = 

A± , * /2 

a _Sw Swx 2 
(X' - XJ. 

(c) In Example 14.3, determine 95 per cent confidence limits for (3 
and for X(F') and Y' when X' = 80. What weight do you use for 
X' = 80? 

14.13. Use the data given in Tables 13.1 and 14.1 for incomes less 
than $4,000 to compute a weighted regression of Y on X. 

14.14. G. A. Baker 10 furnishes an example on ovarian weights (in 
milligrams) of rats receiving five dosages of Serum XII, 16 rats for each 
dosage. 


Dosage in rat units (X) 

4 

6 

8 

12 

16 

Mean weight (F) 

27.00 

38.31 

43.44 

65.25 

72.88 

* 2 (F) 

1.59 

3.96 

3.64 

18.96 

45.95 

Smoothed weights ( w) 

7.72 

2.16 

1.00 

.37 

.19 


The “smoothed weights ' 1 were found by smoothing the values of s(F). 

(а) Show that, if we use the “ smoothed weights," 

f = -15.13 + 69.43 log 10 X, 

where logi 0 X is used as the independent variate. 

( б ) Use as weights reciprocals of $ 2 (F) to derive another estimate of f. 

14.15. It is evident from Table 14.1 that the variances increase system¬ 
atically with X. If there is a perfect correlation between X and <r(F), it 
can be shown that we should use a logarithmic relationship as was done 
in Exercise 13.11 (see reference 11). For all incomes, s(F) actually 
increases faster than X but not for incomes less than $4,000. The indi¬ 
vidual values of s 2 (Z) and s /2 (Z) are given below for the 12 groups in 


; and cr 2 (e') = 


A + J_ + NL. 

w f Sw Swx 2 


> where x' 
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Table 14.1, where Z = log (7 + 1) and s /2 (Z) is used for the incomes 
under $4,000. The variances have been multiplied by 10. 


1 

Group 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

s 2 (Z) 

.196 

.220 

.199 

.174 

.188 

.196 

.152 

.296 

.434 

.329 

.354 

.548 

s' 2 (Z) 

.196 

.220 

.182 

.174 

.188 

.196 

.152 

.296 

.336 

.200 

.285 

.232 


(a) Use Bartlett’s test (Chap. 12) to test for unequal variances for 
s' 2 (Z). Is there a significant upward trend for these variances? 

(b) What do you conclude about s 2 (Z) ? 

(,c ) How would you determine whether or not a log transformation 
would be a useful one? 
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CHAPTER 15 


COMPUTATIONAL METHODS AND METHODS OF ANALYSIS 
FOR A GENERAL REGRESSION MODEL 

15.1. Introduction. In general, if r is large, some short-cut procedure 
is needed to solve for the values. Snedecor 1 and R. A. Fisher 2 discuss 
the use of the C matrix and the Doolittle method of obtaining it. Waugh 
and Dwyer 3 and Dwyer 4 present a variety of computational methods. 
Dwyer 4 outlines the theoretical background for matrix inversion and 
presents numerous examples. 

We shall present an example for r — 4, using some data collected, by 
the Southern Cooperative Group 5 to estimate the quantity of vitamin 
B 2 in milligrams per gram for turnip greens (7) from a knowledge of 
the radiation in relative gram calories per square centimeter per minute 
during the preceding half day of sunlight (Ah), average soil moisture 
tension (X 2 ), air temperature in degrees Fahrenheit (X 3 ), and the product 
(Xi * X 2 ) or (X 4 ). In all, 27 sets of observations were taken on these 
variates. In order to simplify the computations, these variates were 
coded as follows: Xi and X 2 were divided by 100, X 3 was divided by 10, 
and X 4 was divided by 10,000. In general it is advisable to code the 
original data so that the a {j ( Sx&j ) are reduced to values between 1 and 
10, if possible. The coded data are presented in Table 15.1. 

The matrices of the sums of squares and cross products are 
"10.25767 .03798 6.87167 1.17904 “ 

. 1.11550 -.06320 1.99828 

A ~ 15.94667 .166956 

3.94418 


31.6650 


~bi 

-86.8374 

, B = 

b 2 

46.7156 

b 3 

_ —152.8797_ 


_b 4 _ 


The elements of the A matrix are represented as a {j and the G matrix as 
g% ( Sxiy ), while the elements of the B matrix are the estimates of the 
population regression coefficients. 

Two methods of obtaining the inverse matrix will be presented, the 
Doolittle method and the abbreviated Doolittle method. The latter is often 
called the Gauss-Doolittle method. 
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Table 15.1 

Y X x X 2 Z 3 X 4 


110.4 
102.8 
101.0 

108.4 
100.7 
100.3 
102.0 

93.7 

98.9 

96.6 

99.4 

96.2 
99.0 

88.4 

75.3 
92.0 

82.4 

77.1 
74.0 

65.7 

56.8 

62.1 
61.0 
53.2 

59.4 
58.7 
58.0 

Total 2,273.5 


1.76 .070 

1.55 .070 

2.73 .070 

2.73 .070 

2.56 .070 

2.80 .070 

2.80 .070 

1.84 .070 

2.16 .070 

1.98 .020 

.59 .020 

.80 .020 

.80 .020 

1.05 .020 

1.80 .020 

1.80 .020 

1.77 .020 

2.30 .020 

2.03 .474 

1.91 .474 

1.91 .474 

1.91 .474 

.76 .474 

2.13 .474 

2.13 .474 

1.51 .474 

2,05 .4 74 

50.16 5.076 


7.8 

.123 

8.9 

.108 

8.9 

.191 

7.2 

.191 

8.4 

.179 

8.7 

.196 

7.4 

.196 

8.7 

.128 

8.8 

.151 

7.6 

.039 

6.5 

.011 

6.7 

.016 

6.2 

.016 

7.0 

.021 

7.3 

.036 

6.5 

.036 

7.6 

.035 

8.2 

.046 

7.6 

.962 

8.3 

.905 

8.2 

.905 

6.9 

.905 

7.4 

.360 

7.6 

1.009 

6.9 

1.009 

7.5 

.715 

7.6 

.971 

6.4 

9.460 


15.2. The Doolittle Method of Inverting a Matrix. This method is 
used by Fisher 2 and Snedecor. 1 The computations are presented in 
Table 15.2 with accompanying explanations. 

The procedures followed in the computations in Table 15.2 were: 

I. The original A matrix on the left and an identity matrix on the right 
(zeros everywhere except l J s on the main diagonal) and a check column 
(sum of all elements in each row). In all of the computations which 
follow, the same procedure is followed with the check column. Then if 
the computing was done correctly, the sum of each row will equal the 
value in the check column. 

II. Divide each row by the element in the x 4 column of the A matrix. 
Be sure to carry at least five significant digits, and preferably six, in all 
quotients. Remember that the important point is the number of sig¬ 
nificant digits and not the number of decimal places. 



A matrix Identity matrix 
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III. Subtract the last line from all the others, dropping the x 4 column 
in the left-hand matrix. 

IV. Divide by the elements of the right column in this new left-hand 
matrix. 

V. Again subtract the last line from the others and now drop the x% 
column. 

VI. VII. Same as above. 

VIII. The first line gives the cy values, which are computed by dividing 
the elements in (VII) by the left-hand number, -9.20326. 

IX. Substitute in turn the c w values in either line of the left-hand 
matrix of (VI) as follows to determine c 2j . For example, 

[-2.01565 - (—14.08052)(.219015) [ 

° 2 i = | or | = 1.06819. 

I 0 - (-4.87726) (.219015) j 

Note that c 2 i should equal C 12 , except for rounding errors. 

[ 0 - (-14.08052)(1.06819)1 

c 22 = 1 or i = 15.04066. 

[9.83081 - ( — 4.87726)(1.06819) J 

X. Next substitute the values of cy and r 2 _. in any of the equations in 
(IV) and solve for c 3j as in (IX) for c 2j . For example 

.146590 - (1.45200)(.219015) - (-.081998)(1.06819) 

= -.083830. 

0 - (3.78493)(.219015) - (-.697558)(1.06819) 

_ _ 083832 

0 - (.42798)(.219015) - (-.009272)(1.06819) 

= -.083830. 

We note that two of these equations give the same result to five significant 
digits (six decimal places), while the middle one deviates very slightly 
from these two. We have used the value given in (VIII) for c 13 . Round¬ 
ing errors are quite a problem in matrix-inversion calculations. Hence 
it is advisable to carry several unnecessary digits at first in order to be 
able to drop digits as the computing proceeds and end up with as many 
as was thought necessary. 

XI. Next substitute the values of cy, c 2 j, and c 3 y in any of the equations 
in (II), and solve for cy. For example, 
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848148 - 8.70002c n - .032213c 2 i - 5.82819c 3 i 

= -.603125. 

0 — .019006cu — .558230c2i + .031627 c 3 i 

= -.603110. 

0 - 41.15857cn + .378543c 2 i - 95.51421c 8 i 

= -.603128. 

0 - .298932cn - .506640c 2 i - .042331c 3 i 

= -.603111. 

In this case two equations are out of line, and two almost agree with c u . 
In general it is a good practice to omit equations like the first and thiid, 
because several of the elements are so large. These large coefficients 
(such as 41 and 95) are subject to a much greater rounding error, because 
extra digits are needed to give six-decimal accuracy. That is, we are 
trying to calculate c 4 i to six decimal places; hence, the individual multi¬ 
pliers should be accurate to six decimal places. But for numbers like 
41 and 95 to have six decimal places, eight significant digits are required, 
and we cannot obtain eight significant digits in our computing unless 
we use several more in the original A matrix. 

The C matrix has now been completed, but the computer should have 
some more checking besides the check column. The final check is to 
find whether or not the product of the A and C matrices is the identity 
matrix. That is 

AC = D - I 

where I contains only l’s in the main diagonal. To compute an element 
in the ith row and jih. column of AC, we find the sums of the products of 
corresponding elements in the ith row of A and the jth column pf C. 
Let aa be the elements in A and Cy those in C. Then the (i,j) element in 
D is 

4 

dij — ^ o.’ikCkj. 

We generally compute the diagonal elements da to see whether or not 
they are nearly 1 (to the desired degree of accuracy). In the vitamin 
B 2 example 

d ll = (10 25767) (.219015) + • • • + (1.17904) (-.603108) = 1.0000198, 
d 22 = (.03798) (1.06819) + • • • + (1.99828) (-7.92605) = 1.0000380, 
d ss = 1.0000014, 
di 4 = 1.0000057. 

The last two diagonal elements are within the desired five-decimal-place 
accuracy, but du and d 22 only to four places. Hence if we use the present 
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C matrix, we cannot hope for more than four-place accuracy in the 
b’s, and possibly only three-place accuracy. 

XII. The sums of the cross products of the Xi with y are given here 
(ffi = Sxiy). 

4 

XIII. hi = ^ dj-gj. For example, 

3 = 1 


h = (.219015) (31.6650) + • • • + (-.603108)(-152.8797) = 2.4631. 

At this stage it is advisable either to recompute the 6 ,- or to substitute 
these values in the original normal equations. For example, 

10.257676! + • • • + 1.1790464 = 31.6650. 


Substituting the above values of the 6 „ this sum is 31.6614, indicating 
slight inaccuracies. More nearly exact values are given by use of C" 
below. 

XVI. SSR = 26,^ = (2.4631)(31.6650) + • • • 

+ (-1.3769) (-152.8797) = 6,908.04. 
SSE = S\f - SSR = 9,150.53 - 6,908.04 = 2,242.49. 
s 2 = SSE /22 = 101.93. 


XIV. The standard error of 6,- = Vci.-s 2 . 

XV. t = 1 641 / 8 ( 64 ). 

If it is desired to improve the results to more significant figures, we can 
use an iterative device advanced by Hotelling . 6 The procedure is as 
follows: 

(1) Compute the matrix (2 — AC), which is the same as AC, except 
the diagonal elements are subtracted from 2 and the signs of all other 
elements are reversed. 


(2 - AC) = 


.9999802 

- .0000035 
-.0000033 

- .0000079 


- .0000771 
.9999620 

-.0000092 

- .0000778 


.0000044 

-.0000008 

.9999986 

-.0000013 


.0000195 

-.0000046 

-.0000124 

.9999943 


( 2 ) Now compute C (2 — AC) = C", the new C matrix. 


.219012 1.068180 -.083828 

1.068180 15.040626 -.317704 

-.083828 - 0.317704 .095668 

_-.603104 - 7.926049 .181971 


-.603104 

-7.926049 

.181971 

4.441777 


( 3 ) 


b" = [2.46332 -75.3747 


1.58369 


-1.37645], 
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(4) Substituting in the normal equations, we obtain as estimates of 
the gi 

[31.6649 -86.8375 46.7156 -152.8800], 

as compared with the exact values 

[31.6650 -86.8374 46.7156 -152.8797]. 

(5) If we use these values of 5", we find that SSR = = 6,907.76 

as compared with 6,908.04 above. Hence there is no real difference in 
the results by using the h. 

15.3. The Abbreviated Doolittle Method. Now we shall present a 
short-cut method of inverting a symmetric matrix (a»y — Oji), the 
abbreviated Doolittle method, described by Dwyer. 4 - 7 


Table 15,3 


Forward Solution 



X\ #2 ^3 ^ 

y 

Check 

I 

1025767 .037980 6.87167 1.17904 

1.11550 -.063200 1.99828 

15.94667 .166956 

3.94418 

31.6650 

-86.8374 

46.7156 

-152.8797 

50.0114 

-83.7488 

69.6377 
-145.5912 

11 1 

10.25767 .037980 6.87167 1.17904 

1 .00370260 .669906 .114942 

31.6650 
3.08696 

50.0114 

4.87551 

111 1 

1.11536 —.088643 1.99391 

1 -.079475 1.78768 

-86.9546 

-77.9610 

-83.9340 

-75.2528 

1—1 

c 

to 

11.33625 -.464424 

1 - .040968 

18.5923 

1.64007 

29.4641 

2.59910 

v 

v Bi 

.22516 

1 

-.3104 

-1.3786 

-.0852 

-.3784 

fi 


Backward Solution (for 


Cij 

C%j 

C3j 

Cij 


.219003 1.06806 -.0838254 

15.03894 -.317667 

.0956668 


- .603037" 
-7.92514 
.181951 
4.44129 . 


The procedures followed in the abbreviated Doolittle method computa¬ 
tions were: . , 

I. The original A matrix with only the upper right corner is reproduced 

plus the G column (Sx 0 ) and a check column. For the check column, 
we assume the entire A matrix is present. Since the A matrix is sym- 





198 LEAST-SQUARES ANALYSIS 


metrical, the lower corner is an image of the upper corner. We shall call 
these elements a*,-. 


II. Ay — ay, the first row of (I) is reproduced. 

B 1} = Au/Au = A iy/10.25767. 

III. A 2 j — a 2j — {AizBy or AijB 12 ], where a 2j is the element in the 
second row of (I) and A — AxjB 12 except for rounding errors. As 
we noted in commenting on rounding errors for the Doolittle method, it 
is advisable to choose from these two (A 12 B Xj or A V B 12 ) the one for which 
the two members are more nearly equal. 


A 

A 


22 = 1.11550 - 

23 = -.063200 


(.037980) (.0037026) = 1.11536. 

(.037980) (.669906) = -0.088643. 
(.0037026) (6.87167) = -0.088643. 


B 2 j — A 2 j/A 22 . 


IV. A 3j - a 3j - (AuBij + A 23 B 2j ) = a 3j - (A Xj B 13 + A 2 j B 23 ). 

A 33 = 15.94667 — [(6.87167) (.669906) + ( — .088643) ( — .079475)1 
= 11.33625. 

(6.87167)(.114942) + (-.088643) (1.78768) 

= -0.464422. 

(1.17904) (.669906) + (-.079475) (1.99391) 

= -0.464424. 

B zj = A 3 J /A 33 . 

V. Aij — dij (AuBij A~ A 24J3 2 j -f- A 3 iB 3 j) 

= ciij — (AijBu + A 2 j B 24 + A 3 jB 34 ). 

A 44 = 3.94418 - [(1.17904) (.114942) + (1.99391)(1.78768) 

+ (.464424) (.040968)1 

= 0.22516. 

Bij = A 4 j/A 44 . 


A 34 = .166956 - 


This completes the forward solution. If the experimenter desires to 
compute only the {&*•} and the over-all reduction in sum of squares due to 
regression (SSR) without the individual s\b % ) and *(&&), he can make 
these computations without determining the inverse matrix, as follows • 
VI. 64 = B iy = -1.3786. 

b 3 — B 3y — B 34 b 4 = 1.64007 + (.040968) (—1.3786) = 1.5836. 
b 2 = B 2y - B 23 b 3 - B 24 b A = -77.9610 - (-.079475)(1.5836) 


= -75.3706. 
b 1 = B\ y Bi 2 b 2 B 43 b 3 


- (1.78768) (-1.3786) 
B 1A b 4 = 3.08696 


- (.0037026) (-75.3706) - (.669906) (1.5836) 

- (.114942)(-1.3786) = 2.4636. 
SSH = Xb^y = (2.4636) (31.6650) + • • • 

+ ( — 1.3786) ( — 152.8797) = 6,907.74. 
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If we let 6 ' be the estimate of the last b in the Doolittle procedure when 
i fixed variates are used, we note that = 6 '; in other words, we note 
that B iy will give the value of h if this is the last b to be computed. This 
is called a completely adjusted b, since Xi has been adjusted for all the 
other x’s. In our case with r = 4, b ' 4 = B 4y .f But if only the first three 
x’s were used, we would omit the x 4 column and the A 4 and B 4 rows, so 
that the regression coefficient for x 3 would then equal B 3y . This would 
be b' 3 . We also note that A ly = Sxiy and, proceeding as in Exercise 
14.6, A 2y = S(x 2 adjusted for xi)y, A 3y = S(x 3 adjusted for aq and x 2 )y, 
and A iy = S(x 4 adjusted for x 1} x 2 , and x 3 )y. Hence, using the same 
method as in Exercise 14.6, we can consider the regression equation asf 

y = b[xi + b' 2 (x 2 adjusted for xi) + b 3 (x 3 adjusted for xi and x 2 ) 

+ 64 ( 0:4 adjusted for x h x 2 , and x 3 ) + e. 

Hence we can write 


4 4 



= (31.. 6650) (3.08696) + • • • + (-.3104) (-1.3786) = 6,907.74. 

The above results indicate that a decided saving in computation can 
be made by use of the abbreviated Doolittle method if the experimenter 
decides to omit the last k fixed variates from the regression model; in 
this case, the computations for the first (r — k) variates need not be 
changed. Of course the experimenter seldom knows which fixed variates 
might be omitted before he starts the computations; hence, he does not 
know which should be put last in the computing scheme. However, 
in many cases, there are some fixed variates which he would like to omit 
because they are costly or difficult to measure. In such a case he might 
put these k fixed variates last and omit them if they contributed little 
extra to SSR. For example, the added contribution of £3 and £4 above 
to SSR is only (A Zy B Zy + A 4y B 4y ) = 30.91, with 2 degrees of freedom. 

Despite the obvious errors of rounding in computing the 5, by the 
abbreviated Doolittle procedure, SSR is only .02 less than the value 
computed after the use of the Hotelling iterative procedure on the Doo¬ 
little solutions. 

If the computer needs the inverse (C) matrix, the computations pro¬ 
ceed as given for the so-called “backward” solution. 

VII. First compute c i4 values. 

C 44 = I/A 44 = 1/.22516 = 4.44129, 

c 3 4 = -CuB u = -(4.44129) (-.040968) = .181951, 

f b[ is often written as 64.123, which reads the regression coefficient for Xi adjusted for 
x\, xz, and x 3 . b' z would be 63.12, and 6g = 62.1, while 6 X is an unadjusted 61. 




200 


LEAST-SQUARES ANALYSIS 


c 24 = -C345 2 3 - CuB 2 , = (,181951)(.079475) - (4.44129)(1.78768) 

= -7.92514, 

C 14 = — C 24 B 12 — C 34 H 13 — C 44 S 14 = —.603037. 

Check by use of ai 4 Ci 4 4~ * * • + a 4 4 C 44 = .99997. 

VIII. Ciz values . 

C 43 = c 3 4 = .181951, 

c 33 = l/A 38 - c 34 R 3 4 = .0882136 - (.181951)(-.040968) = .0956668, 
c 23 = -c u Bn - c 34 5 2 4 = (.0956668) (.079475) - (.181951) (1.78768) 

= -.317667, 

C\z — — C 2 zB \2 CzzBn CziB 44 = .0838254. 

Again check by use of ai 3 Ci 3 + • • • + 0^43 = 1.0000008. 

IX. c i2 values. 

C 42 — c 2 4 == 7.92514, c 32 = c 23 =:: .317667, 

C 22 := 1 /A 22 — C 2 zB 2 Z C 2 4-B 2 4 

= .8965715 - ( — .317667)(— .079475) + (7.92514)(1.78768) 

= 15.03894, 

C 12 = — C 22 B 12 — c 2 zB 13 — C 24 B 14 = 1.06806, and 

Ci 2 ni 2 -j- * • * 4~ C4 2 a 42 = .99993. 

X. Cil = 1/All — Ci 2 -B 12 Ci 3 J5 13 Ci4jBi4 

= .0974880 - (1.06806) (.0037026) + (.0838254) (.669906) 

+ (.603037) (.114942) = .219003, and 

Cn&n 4“ * ‘ ‘ 4“ ^41^41 = 1.0000002. 

XI. The solutions for the b i} SSR, and s 2 (bi) proceed as with the 
Doolittle method, 

hi — [2.46334 -75.3693 1.58356 -1.37975], SSR = 6,907.79. 

These b’s can be checked against those computed in (VI) above. 

Again we note that some of the hi are not very accurate but that SSR 
is off only slightly in the last decimal place. The Hotelling iterative pro¬ 
cedure can be used to improve this C matrix also. Rounding errors 
seem to be of more importance with the abbreviated method; hence, it 
is especially advisable to carry extra places at first in order to secure the 
desired accuracy at the end without having to use the Hotelling iterative 
device to improve the accuracy. If it is known in advance that the C 
matrix will be needed, an identity matrix can be carried on the right of 
the abbreviated Doolittle matrix as with the Doolittle matrix and the 
same computations used there as in steps I to VI; the computing proce¬ 
dure is described on page 191 of reference 4. 
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15.4. Analysis of the Results. The experimenter will generally look 
at the simple regressions of Y on each of the fixed variates first, in order 
to obtain some idea of the usefulness of each of these fixed variates. 
This does not mean that a fixed variate will be omitted if its simple regres¬ 
sion coefficient was not significantly different from zero; there may be 
good theoretical reasons for adjusting all other fixed variates for one 
which, of itself, was not too important as a predictor. But if the experi¬ 
menter had first run the simple regression of Y on X 2 , he would have 
discovered a highly significant relationship; yet the t value was not sig¬ 
nificant when all four X*s were used. And he would be even moie per¬ 
plexed when he studied the following analysis of variance for all four 
fixed variates (based on the Doolittle solution): 


Source of variation 

Degrees 

of 

freedom 

Sum of squares 

Mean square 

Regression (4 variates) 

4 

6,907.76 

1,726.94 

Regression (Xi only) 

1 

6,759.98 

6,759.98 

Added reduction due to X 4 , X 3 , X 4 

3 

147.78 

49.26 

Error 

22 

2,242.77 

101.94 


From this analysis of variance, we conclude that the over-all reduction 
due to the use of the regression equation is highly significant 


1,726.94 

101.94 


16.94 


and that the added reduction due to the three fixed variates other than 
X 2 is decidedly nonsignificant. Hence we are led to conclude that X 2 
is a very important predictor and that, the other predictors add nothing 
to the reliability of the estimate of vitamin B 2 content. 

Why then is b 2 not highly significant when all four X’s are used in the 
model ? The answer lies m the peculiar nature of X 4 , which is the product 
of Xi and X 2 , causing X 4 and X 2 to be highly correlated. Hence b 2 and 
64 are also highly correlated, so that the actual influence of X 2 on Y is 
split into a part contributed by b 2 and another part contributed by b 4 . 
It is impossible to interpret b 2 as the change in Y when X 2 varies while 
holding the other X* constant, because X 4 will vary when X 2 varies. 
The change in f when X 2 varies is given by 

z = = b 2 + 6 4 Xx. 


The average value of this change is 

2 = b 2 + b 4-Xx = -75.3747 - 1.3765(1.8578) = -77.9320. 
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This average change is almost the same as b' 2 , the average change in Y for 
a unit change in X 2 , neglecting all other fixed variates. The estimated 
variance of this average change, z, is 

s 2 © = s\b 2 ) + (XO V(6 4 ) + 2X lS (6 2 5 4 ) 

= [15.04063 + 4.44178(1.8578) 2 - 2(1.8578)(7.92605)](101.94) 

= (.92105) (101.94) = 93.893, 

and the standard error is 9.6899, so that 


a highly significant value. We have used (2) and (3) (page 196), in 
this computing. 

This example was selected because it illustrates some of the difficulties 
of interpreting regression analyses when some of the fixed variates are 
closely related to one another. If two fixed variates are highly corre¬ 
lated, it is unrealistic to assume that one can be held constant while the 
other varies. A multiple regression coefficient can be interpreted only 
as the average change in Y for a unit change in X» when the other X’s 
are not changed. Hence to interpret two highly correlated regression 
coefficients, we should like to know the relationship between the X’s in 
order to study the real change in Y when one of the X’s changes. This is 
not to say that the use of a fixed variate like X 4 ( = XiX 2 ) is undesirable. 
On the contrary, it is often quite desirable to be able to say how z differs 
for various values of X 4 . For example, if X 4 were temperature and X 2 
were rainfall, then it would be highly desirable to know how the effect 
of 1 inch of rainfall varied for different temperatures, X 4 . We know that 
for most crops high rainfall may be detrimental at low temperatures but 
quite valuable at high temperatures. Hence a knowledge of the regres¬ 
sion of yield on temperature and rainfall alone would be rather useless 
unless the cross-product term were also included (see reference 8 for 
an example of such a study). In some analyses, it may be impractical to 
consider that any of the fixed variates can be held constant while some 
other one varies. In this case the regression equation should be con¬ 
sidered as a whole. One example of this is given in Exercise 15.4. 

The estimated variance of the average value of Y for a fixed set of 
X’s, {X'j, can be computed directly from the {c ^} and the value of s 2 . 

r 

Given P = Y + £ MV - V). 

i= 1 



'3 
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where x\ = XJ — X;. Using the values of the c^, found by the use of 
the Hotelling iterative procedure (page 196), we have for our vitamin 
B 2 example 

«*(?') = 101.94[^r + x[(.2mi2x[ + 1.068180x' 2 - .083828x' 3 

- .603104x0 + x'(l.068180x1 + 15.040626x' 2 - .317704x' 
-7.926049x' 4 ) + x 3 (-.083828x1 - .317704x' 2 + .095668x' 3 
+ .181971x0 + xl( —.603104x1 - 7.926049x' 2 + .181971x1 

+ 4.441777x0] 

Similarly the estimated variance of a single predicted value of Y is given 

by 

s 2 (F')+s 2 . 

Confidence limits can be assigned to the various estimates as follows: 

hi — t a s(bi) < pi < h, t a s(bi), 

P - <„«(?') < E[Y\\X'A] <P + t»s(P), _ 

P - t a V?(P) + s 2 < F|{X'! <f’ + t a Vs*(P) + s 2 , 

where 

CP(|^| > ta) = <*. 


EXERCISES 


15.1. Show how the sum of squares for the regression of Y on X x and 
X 2 alone could be obtained in Sec. 15.4. 

15.2. Select some data in a familiar field of application with one 
dependent and at least three fixed variates, and carry out the calculations 
leading to the estimates of the regression coefficients and their standard 
errors. Investigate the usefulness of the various fixed variates, and indi¬ 
cate whether or not any should be discarded. 

15.3. (a) Show that if X 3 is omitted from the regression equation for 
the vitamin B 2 example, the new regression coefficients (£>•') and c values 
(c") are 



CziC-zj 

C33 


(See reference 2.) 

(5) See reference 9 for the adjustments needed if one extra fixed variate 
is used. 

15.4. A cost study was made of 89 dairy farms in 1941. 10 The depend¬ 
ent variate was the amount of milk sold ( Y ) with the following fixed 
variates: amount of concentrates (Xi), amount of silage (X 2 ), pasture 
cost (X 3 ), and amount of roughage (X 4 ). The means and sums of squares 
and cross products adjusted for the means are given below, with the 
amounts in thousand pounds per cow and the pasture cost in $10 per cow. 
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Xl Xz Xi 

y 

Xi 

50.5154 -66.1617 -4.84289 -.937732 

36.7974 

X 2 

967.1077 13.5895 32.4425 

39.0556 

Xz 

12.5457 -12.5195 

7.02815 

Xi 

192.3053 

9.99432 

y 


113.5872 

Mean 

2.94310 3.90647 1.16426 3.60326 

5.73994 


(a) Compute the regression coefficients and their standard errors 
Use the abbreviated Doolittle method, and complete the backward solu- 
tion also. 


(b) Show that all regression coefficients except 6 4 are significant at the 
1 per cent probability level. 

(c) Use the y column in the forward solution to determine the error 
variance when :r 4 is removed from the prediction equation. 

(d) In 1941 the cost per thousand pounds was $18 for concentrates, 
$2.70 for silage, and $8.50 for roughage and milk sold for 3.2 cents per 
pound. Estimate the profit or loss and its standard error if 3,200 pounds 
of concentrates, 4,000 pounds of silage, 3,000 pounds of roughage, and 
$15 worth of pasture were used per cow. 

15.5. J. T. Wakeley 11 has compiled some soils weather data and data 
on the vitamin content of turnip greens, contributed by the Georgia 
Agricultural Experiment Station for the Southern Cooperative Group. 
The dependent variates are milligrams of ascorbic acid [FJ and micro¬ 
grams of riboflavin [Y 2 ], each per 100 milligrams of dry weight. The 
fixed variates are soil moisture tension (atmospheres -f- 10) [XJ and mean 
temperature (degrees Fahrenheit 4- 100) [X 2 ], each at 8 inches depth; total 
radiation in gram calories per square centimeter per minute 4 - 1,000 [X 3 ] 
and evaporation in centimeters [X 4 ], each for the previous 48 hours; and 
number of days since planting 4 - 100 [X 5 ]. 

The means and sums of squares and cross products are: 



Xi 

x 2 

Xz 

Xi 

Xb 

Mean 

Xi 

6.6510 

2.6722 

4.4957 

2.6338 

.14360 

.5475 

x 2 


4.8750 

8.3609 

4.3597 

-3.2533 

2.7156 

x 3 



19.8174 

9.5315 

-6.6767 

1.4353 

Xi 




5.2394 

-3.1452 

.5810 

X B 





3.7799 

1.0881 

y l 

-.60160 

-1.5771 

-1.6953 

-.81984 

.94173 

2.8497 

y 2 

-3.2590 

7.1768 

10.9862 

5.9285 

-7.1415 

6.1834 


syi 

= 1.5025, 

Syl = 

28.6900, 

n = 32. 
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accompanying tables. 


Data of Exercise 15.6 


Y i 


Y a 


1.55 

1.63 

1.66 

1.52 

1.70 

1.68 

1.78 

1.57 

1.60 

1.52 

1.68 

1.74 

1.93 

1.77 

1.94 
1.83 
2.09 
1.72 
1.49 

1.52 
1.64 
1.40 

1.78 
1.93 

1.53 


20.05 

12.58 

18.56 

18.56 

14.02 

15.64 

14.52 

18.52 
17.84 
13.38 
17.55 
17.97 
14.66 

17.31 

14.32 
15.05 
15.47 

16.85 
17.42 

18.55 
18.74 
14.79 

18.86 
15.62 

18.56 


1.38 

2.64 
1.56 
2.22 

2.85 
1 .24 

2.86 
2.18 

1.65 
3.28 
1.56 
2.00 
2.88 
1.36 

2.66 
2.43 
2.42 
2.16 
2.12 
1.87 
2.10 
2.21 
2.00 
2.26 
2.14 


Sum 42.20 

Sx> or Sy> .6690 


415.39 

101.4644 


54.03 

6.5921 


2.02 

2.62 

2.08 

2.20 

2.38 

2.03 

2.87 

1.88 
1.93 
2.57 
1.95 
2.03 
2.50 
1.72 
2.53 
1.90 
2.18 
2.16 
2.14 
1.98 
1.89 
2.07 
2.08 
2.21 
2.00 

~53.92 
1.8311 


r 

U 

x 6 

x 6 

.51 

3.47 

.91 

.50 

4.57 

1.25 

.43 

3.52 

.82 

.52 

3.69 

.97 

.42 

4.01 

1.12 

.44 

2.79 

.82 

.50 

3.92 

1.06 

.49 

3.58 

1.01 

.56 

3.57 

.92 

.51 

4.38 

1.22 

.48 

3.28 

.81 

.50 

3.31 

.98 

.48 

3.72 

1.04 

.52 

3.10 

.78 

.50 

3.48 

.93 

.46 

3.48 

.90 

.48 

3.16 

.86 

.49 

3.68 

.95 

.48 

3.28 

1.06 

.48 

3.56 

.84 

.53 

3.56 

1.02 

.52 

3.49 

1.04 

.50 

3.30 

.80 

.44 

4.16 

.92 

.51 

3.73 

1.07 

L2.25 

89.79 

24.10 

.0258 3.7248 

.3828 
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Sums of Cross Products 


Xi 

X 2 

X 3 

Xi 

Xt> 

x e 


x 2 


X 3 


Xi 


x 6 


x 6 


3589 -.0125 -.0244 1.6379 .5057 

—.3469 .0352 .7920 .2173 

-.0415 -1.4278 -.4753 

.0043 .0154 

.9120 


Vi 


v% 


Vz 


.2501 
-1.5136 
.5007 
- .0421 
-.1914 
- .1586 


-9.6105 

12.8511 

2.4054 

.3945 

-9.3692 

-3.2733 


2.6691 
-2.0617 
- .9503 
-.0187 
3.4020 
1.1663 


(а) Analyze the regression of per cent nicotine (Y„) on the percentages 

of total nitrogen (Xj) and potassium (X„). percentages 

(б) Analyze the regression of cigarette burn (Y 1 ) on X 2 , X 3 and X. 

What would be the effect of omitting X 3 ? ’ 

Analyze the regression of per cent sugar (F 2 ) on Xi X? J r andr 
What would be the effect of omitting X 5 and X 6 ? ’ ’ and 
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CHAPTER 16 

CURVILINEAR REGRESSION: ORTHOGONAL POLYNOMIALS 

16.1. Introduction. If it is desired to fit a regression equation using 
successive powers of one or more fixed variates, the methods given in 
Chap. 15 can be applied. For example, suppose we had a series of annual 
rainfall data (F) for the years 1900 to 1949 and wished to determine a 
regression line such as the following polynomial: 

F - a + piX + /3 2 X 2 + 0 3 X 3 + /3 4 X 4 + e, 

where X — 1, 2, . . . , 50. In order to use the methods of Chap. 15, 
merely set X i = X*. Snedecor 1 presents some examples using this 
technique of determining a polynomial regression line. 

If the fixed variate is equally spaced, such as in time or space, a con¬ 
venient method of curve fitting by orthogonal polynomials can be used.f 
This method was first developed by Tchebysheff 2 and has since been 
extended by many writers. Some of the books and articles on this sub¬ 
ject are references 3 to 8. The advantage of orthogonal polynomials over 
the usual regression variates arises from the fact that, the former are so 
constructed that any term of the polynomial is independent of any other 
term. This property of independence permits one to compute each 
regression coefficient independently of the others and also facilitates test¬ 
ing the significance of each coefficient. A summation method is illus¬ 
trated by Fisher 9 and Snedecor, 1 but the £' polynomial method presented 
by Fisher and Yates 10 with tables for X through 75 is generally more 
expeditious if tests of significance of the succeeding terms of the poly¬ 
nomial are desired. Another method described by Aitken 5 is also recom¬ 
mended if tables are available. 

We shall consider the use of the % polynomials, for which a computer’s 
bulletin was prepared by Anderson and Houseman, who also included an 
extension of the £' tables through X = 104. 11 An example is given in 
this bulletin for 62 annual United States sugar prices, 1875 to 1936. 

16.2. Determination of the Polynomial. Suppose we wanted to fit 
a polynomial of high enough degree to n equally spaced points so that 

f The method of orthogonal polynomials can be used for unequally spaced X’s, but 
it is not so advantageous for such data. 
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succeeding terms would not reduce the residual sum of squares by a 
significant amount, for example, 

Yj — 00 T* 01 X 3 + 02®/ + 03®/ + * * * + 0r®/ + C/, 

where Xj = X 3 - — X — j — j. Since j = (n + l)/2, we see that Xj takes 
on the values 

-(n - l)/2, -(n - 3)/2, . . . , (n - 3)/2, (n - l)/2. 

Let us attempt to replace the above equation by one of the form 

Yj — CXo + OL \P 1/ + OL 2 P 2j + * * * + Oi r P r j + €j,'\ 

where the a’s are functions of the 0’s and the P’s are functions of the 
powers of x chosen so that each P# is a function of all powers of Xj (=j — j) 
equal to or less than i: 

Pij = C i0 + CnXj + Cizxj + • • • + CaXj. 

Let us construct a set of r polynomials, that is, determine the coeffi¬ 
cients, C’s, so that each polynomial is orthogonal to all others including 
Po = 1. That is, 

SPijPkj = 0, i = 0, 1, . . . , r; k 5^ i. 

3 

In the future, S will refer to summation over j. For a given P ij} we want 
the i sums for k = 0, 1, . . . , i — 1 to be zero. Since each polynomial 
is a function of the powers of x equal to or less than k, we can replace the 
above summations by 

SPijxf = 0, k = 0, 1, 2, . . . , i — 1. 

3 

That is, we have i equations to determine the values of the (t -f- 1) coeffi¬ 
cients: Ci 0, Cn, . . . , Cu. Hence it is necessary to fix the value of one 
constant; we shall set Cu = 1. 

Using the definition of P# in terms of the x’s, we have for a given k 

i — 1 

S(Ci 0 -f- CnXj + •*•-}- Ci,i-ixj- 1 + xf)Xj = V CuSxj +l + Sxp~ k = 0. 

We know that Sx m = 0 when m is odd. Hence we have the following 
equations for i odd: 

f We shall use the letter P instead of £ in the theory which follows. 
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ft Equation 


0 

Ci.cn + Ci, 2 & 2 

+ 

• • • + Cu-iSx*- 1 « 0 

2 

Ci,aSx 2 + Ci, 2 Sx* 

+ 

• • • + Ci.i-tSx^ 1 = 0 

i - 1 

C^Sx*- 1 + C it2 Sx { + 1 

+ 

• • • + Ci'i-iSx^-v = 0 

1 

Ci.vSx 1 + Ci.sSx*' + ■ 

' * • 

-f + & i+1 = 

3 

C,M& 4 + C i:!l Sx 6 + ■ 


+ Ci.,-„2iSa; i+1 + & i+s = 


i - 2 Ci.iSx*- 1 + Ci.tSx* 1 + • • • + Ci.i^Sx^ + Sx*~* = 0 


The \{i + 1) equations for ft even involve only the coefficients 
. . . } C iti - 1 }, and since these equations have no constant term, all the 
coefficients are zero. But the — 1) equations for ft odd do have non¬ 
zero constant terms, and we can solve for the coefficients 

{ft.!A., * . • A<-*}. 

Hence we conclude that when i is odd (i = 1,3, . . .), 

Pi = x, 

Pz = .t 3 + CV ix, 

Pi = x { + Ci,i~ 2 x l ~ 2 + * * * + Ci,zx z + Ci,\x. 

Similarly it can be shown that when i is even (i = 0,2,4, . . .), 

Po = 1, 

iY=s 2 + C 2(0 , 

Pi — X A + C 4 , 2 X 2 + Ci,0, 


Pi = X { + Ci,i- 2 X { 2 + * • * + Ci,2X 2 + ft,0. 

Note that we have omitted the j subscript for convenience. 

From these results we see that 

xPi — Pi + 1 + D^iPi-i + Di, 2 Pi-z + * * * j 

where the D’s are functions of the C’s, For example 

xPi = x 2 = P 2 — C 2 , 0 , 

XP 2 = X s + C 2 ,qX = P3 + ( 0 2.0 Gz,i)x, 

Du = —C 2 , 0 , T) 2 i — C f 2 ,o — ^ 3 , 1 , Dij = 0 for l > 1 . 

Since (TF\-) is a function of powers of x equal to or less than (i + 1), the 
orthogonality requirement, SPix k = 0 for l > ft, guarantees that 

SxPJPi = 0 

for l > O’ + 1) and because of the symmetry of i and Z for i > (l -f- 1). 
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Also since P? is a function of even powers of x, SxP ? = 0. Hence the only 
cases for which SxPJPi 0 are l = i ± 1. And when l = i ± l, all of 
the terms except those for P i+l and i vanish. That is 


Hence 


SxP,P i+1 = SPf + 1 or SxP^Pi = SPf, 

SxPiP, _! = Di,iSPf_ v 


Di, 1 = 7 


and all other D’s = 0. 

Therefore we now have a recursion formula to determine higher-order 
polynomials in terms of lower-order ones: 

p p SPf n 

Pi +1 = *Pi ~ gpf - Pi-l. 

Since Pi = x, P 0 = 1, £P? = &r 2 = n(n 2 - 1)/12, and &P 2 = 

r, o n 2 - 1 


Similarly 


£P 2 = Sx* -n 


n(n 2 — l)(n 2 — 4) 
180 


r> 3 (n 2 — 1 n 2 — 4 

Ps = x - {~n- + ~w 

A general formula for SPf is 


3 n 2 - 7 


Hence 


SP 2 = ~ l)(n 2 - 4) * • • ( n 2 - i 2 )(i\) 2 [(i - 1)!] : 

1 4(2 i + l)[(2f - 1) !J 2 


SPf (a 2 — t 2 )r 


D,, = 


SP?_i 4(4P - 1) 


A proof of these results is beyond the scope of this book but can be found 
in most of the sources cited in Sec. 16.1. Hence 


= _ \ Zn% - 7 . (w 2 - 9)9 ] T 2 , 90 

20 + 140 J + 

W - 13 3(n» - l)(n» - 9) 


i 9(» 2 - D(» 2 ~ 9) 
(12)(140) 


E— /y.4 _ ___ /y.2 L. ~ V“ ^ ” J 

14 + 560 

The reader will note that these polynomial values, P ijf will be fractions. 
For convenience of computing and of presentation of tables of the 
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polynomial values, it is desirable to use only integral values for the Py. 
The regular polynomial values have been multiplied by a constant, Xj, 
in references 10 and 11, so chosen that the values of P# = XjPj, are 
integers reduced to lowest terms. Hence 

7,. = a' + a'f'y + a'JP'y + • * • + a'P' rj + e,-, 

where a* = aJX* and a 0 = a' Q . 

Example 16.1. As an example, consider the construction of the P„ 
values forn = 5 (refer to Example 6.4 on page 62). Obviously, for five 
points, a fourth-degree polynomial will fit the data exactly; hence, only 
p 0 (== 1), P h P 2 . P ;i , and P 4 exist. Using the formulas already indicated, 

we have = j— U — j — 3, and 


Pi; = =• 

xj - 

2, 

-a? 

— -Yx ■ 

5 U/ j, 

P 4i = 

xf - Vs? + 

The polynomial values 

are given in 

the table below. 

We note that 

j 

Pu 

Pw 

P3i 

P 4 / 

P’ ■ 

1 a? 

p\i 

1 

-2 

2 

6 

~ 5 

12 
- 5-5 

-1 

1 

2 

-1 

-1 

L 2 
~5~ 

48 

—5T 

2 

— 4 

3 

0 

-2 

0 

7 2 

■5T 

0 

6 

4 

1 

-1 

1 2 
“~5~ 

4 8 

■~S"5 

-2 

— 4 

5 

2 

2 

6 

T 

1 2 
- 5-5 

1 

1 

S(Pfr 

10 

14 



10 

70 

Pij = Pi, and P 2 , 

= Pi,-; that 

is, Xi = 

x 2 — 1. 

CO 

II 

C5[tK 

and X 4 = x-f-, so 


that PJy = (5P3;)/6 and P^ = (35P 4 i)/12. 

16.3. Estimating the Parameters, and a- 2 . We are given the regres¬ 
sion equation to fit to n equally spaced points: 

Y s = £ a’f'ii + ey = ?y + ey, r + 1 < «, 

i = 0 

where the {e} are assumed NID(0,o- 2 ) and 

Yj — AJ + AJPi,- + • * * + A'P' i? 

with Mi = A'. The least-squares equations to determine the A' are 
quite simple, because of the orthogonal property of the P n s. In fact 
Aq = Y, and 

A , _ SJJTP^ 

{ S(P') 2 ' 
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The polynomial tables in reference 11 present values of the PJ, \ iy and 
S(P') 2 for values of n through 104. In these tables, P- = 

The sum of squares of deviations from the regression is given by 

SSR = nY 2 + A[S(YP[) + A' 2 S(YP') + ■ ■ • + A' r S(YP' r ), 

where each term is the independent reduction due to the successive 
polynomials: mean, P[, P', • • • , P'. Also 

r 

SSE = SF 2 - SSR = Sy* - £ SSR.-, 

i = l 

where 

SSR; = A'S(YP') = 0SFP') 2 /£(P') 2 . 


s 2 = SSE/(n — r — 1) is an unbiased estimate of a 2 . Hence we can set 
up the following analysis of variance: 


Source of variation 

Degrees of freedom 

Mean square 

Linear regression (Pj) 

1 

SSRi = (£FP;)7£(P;)2 

SSR 2 = ( SYP' 2 Y/S(P' 2 Y 

Quadratic regression (P' 2 ) 

1 

r th degree regression (P') 

1 

SSR = ( SYP rJ */S(P' r y 

Error 

n — r — 1 

s 2 = SSE/(n - r - 1) 


In general we do not know what degree (r) should be used; so tests of 
successive terms are made until there is no material reduction in $ 2 by 
the use of additional terms, f 

It should be pointed out that the equation 

f = a; + A[P[ + ■■■ + A'r r 

can be used for prediction purposes. However, if the original polynomial 
form of the equation is needed, the P- will have to be replaced by their 
equivalent polynomial function, 

Pi = \(Cio A Cnx + Ci^x 2 -f- • • • + x*). 

This is adequately explained in reference 11. 

Example 16.2. In order to compare the number of spears per asparagus 
plant for male and female plants, the California Agricultural Experiment 
Station at Davis obtained the following data on the differences (F) 

f This procedure introduces a slight bias into the estimate of <r 2 , because the proce¬ 
dure is unbiased only if the model is postulated in advance (r is known in advance). 
We doubt that this bias is very serious, but some theoretical or empirical studies of 
this point should be made. 
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Year 

O') 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

Sum of 
squares 

Y 

1.1 

7.1 

11.0 

12.6 

14.7 

19.9 

25.1 

23.9 

23.1 

23.6 

26.0 

24.6 

4,516.43 

P[ 

-11 

-9 

-7 

-5 

-3 

-1 

1 

3 

5 

7 

9 

11 

572 

P' 

55 

25 

1 

-17 

-29 

-35 

-35 

-29 

-17 

1 

25 

55 

12,012 

P' a 

-33 

3 

21 

25 

39 

7 

-7 

-19 

-25 

-21 

-3 

33 

5,148 

Pa 

33 

-27 

-33 

-13 

12 

28 

28 

12 

-13 

-33 

-27 

33 

8,008 

i , 



Fig. 16.1. Changes in differences between mean number of male and female asparagus 
spears, 1925 to 1936. 


between the mean number of spears per plant for each of the years 
1925 to 1936, here indicated as years 1 to 12 (see Fig. 16.1). The first 
four sets of polynomial values are given below the yields. The A’s are 


Ai — 2, X 2 = 3, A 3 — -f, X 4 

= fa. The computed results are given below. 

8Y = 212.7, 

A' 0 = 17.725, 

n? 2 = 3,770.11, 

SP[Y = 602.1, 

A\ = 1.05262, 

SSRi = 633.78, 

SP' 3 Y = -1,025.7, 

A' 2 = -.0853896, 

SSR -2 = 87.58, 

SP',Y = -19.5, 

Ai = -.00378788, 

SSRs = .07, 

SPIY = 71.7. 

A', = .00895355. 

SSR 4 = .64. 








214 


LEAST-SQUARES ANALYSIS 


The analysis of variance is as follows: 


Source of variation 

Degrees of 
freedom 

Sum of squares 

Mean square 

Sy* 

11 

746.32 


Linear regression (P]) 

1 

633.78 

633.78 

Deviations from linear 

10 

112.54 

11.25 

Quadratic regression (P' 2 ) 

1 

87.58 

87.58 

Deviations from quadratic 

9 

24.96 

2.77 

Cubic regression ( P J) 

1 

.07 

.07 

Deviations from cubic 

8 

24.89 

3.11 

Cubic and quartic regression (P^P't) 

2 

.71 

.36 

Deviations from quartic 

7 

24.25 

3.46 


On the basis of these results, we conclude that the regression is quad¬ 
ratic and that the best estimate of a 2 is 2.77 with 9 degrees of freedom. 
It should be indicated that successive terms of the polynomial are tested 
by use of the deviations from regression. For example, we test for the 
existence of linear regression by use of 

F' = (633.78)/(11.25) = 56.34 

with (1,10) degrees of freedom. This is not an exact test, because we find 
out later that the denominator is an overestimate of a 2 . However, the 
test is certainly on the safe side; that is, if F' is significant, the F by use 
of the best estimate of a 2 will certainly be significant. Hence we can use 
F' as a guide to the advisability of considering any form of regression. 
It is always advisable to compute at least one more polynomial value 
after the first nonsignificant one. That is, we might have stopped with 
Pg, but it is advisable to try out P' 4 to make sure it is not significant. 
Using the quadratic regression equation, 

Y = 17.725 + 1.053P; - .0854P; 

= 17.725 + 2.1060’ - 6.5) - .0854[3(j - 6.5) 2 - 35.75] 

= 17.725 + 2.1060 ~ 6.5) - .0854(3j 2 - 39 j + 91) 

= -3.735 + 5.437,/ - .2562 j 2 

= (1-45, 6.11, 10.27, 13.91, 17.04, 19.66, 21.77, 23.36, 24.45, 25.02, 

25.07, 24.61). 

These results are displayed graphically in Fig. 16.1. 
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EXERCISES 

16.1. (a) Prove that A' is an unbiased estimate of aj, ^ 2 (AJ) = <r 2 /$(PJ) 2 , 
and a(A' i A' l ) = 0 for i ^ l. 

(b) Prove that s 2 is an unbiased estimate of a 2 . 

(c) Prove that SSR is independent of SSE. 

16.2. Derive the formula for P 5 . 

16.3. Compute the polynomial values, P(, i = 0, 1, * * * , 5, for n = 6. 

16.4. (a) Determine the standard errors of A[ and A' 2 in Example 16.2, 
and then set up 95 per cent confidence limits for the true regression 
coefficients a[ and a 2 . 

(b) Do the same for A i and A 2 as estimates of ai and a 2 . 

(c) What is the standard error of f'j? 

16.6. Given the following data on the output (Fi) in pounds per man¬ 
hour for General Motors employees and the aggregate weight of all 
General Motors automobile products in millions of pounds (F 2 ) for 
the years 1929 to 1941. 


Year 

1929 

1930 

1931 

1932 

1933 

1934 

1935 

1936 

1937 

1938 

1939 

1940 

1941 

Fi 

12.55 

13.41 

14.79 

14.56 

15.05 

15.67 

17.60 

18.30 

18.61 

19.69 

19.87 

20.81 

21.83 

f 2 

5,632 

3,537 

3,175 

1,692 

2,477 

3,812 

5,338 

6,617 

6,868 

4,020 

5,618 

7,492 

8,631 


(a) Use a set of orthogonal polynomial tables (reference 10 or 11), or 
construct such a set for n = 13 to determine the best fitting polynomial 
to fit to each set of data. 

(b) If the same degree equation is to be used for both sets of data, 
what degree would you advise? 

(c) How would you determine the regression of F 2 on Fi, both adjusted 
for time trends? What theoretical difficulties do you see in the deter¬ 
mination of the regression of F 2 on Fi? 

16.6. Snedecor 1 presents the following problem on the average weights 
of sunflowers: 


Week 

1 

2 

3 

4 

5 

6 

7 

8 

9 

10 

11 

12 

Height 

18 

36 

68 

98 

131 

170 

206 

228 

247 

250 

254 

254 


(a) What degree of polynomial should be used to estimate the height 
after j weeks ? 

(b) Plot the actual and estimated heights. 

(c) Determine the standard errors of the regression coefficients. 

16.7. We know that the graph of the equation 

F = 10 - 2X + 3X 2 - X 3 
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passes through the points 

(1,10), (2,10), (3,4), (4,-14), (5,-50), (6,-110). 

Use these points to derive the prediction equation 

f = A’ 0 + A\P\ + Api + API, 
and show that Y = Y in this case. 

16.8. The student is advised to select some data from his own field of 
research for which a polynomial prediction equation is useful and to 
carry out the necessary computations to determine Y. 

16.9. (a) Use the general definition of orthogonal polynomials, 

SPijxf = 0, i k, 

3 

to derive a complete set of orthogonal polynomial values for the following 
X’s: 2, 3, 6, 9. 

(b) When would it be useful to derive a set of orthogonal polynomial 
values for unequally spaced X’s? 
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CHAPTER 17 

LEAST SQUARES FOR EXPERIMENTAL DESIGN MODELS 

17.1. Introduction. The least-squares methods outlined in the preced¬ 
ing chapters can be used to analyze the effects of different treatments 
in a planned experiment. We shall consider a variety of experimental 
results in succeeding chapters, among these being the following: 

(i) The yields of a crop for different varieties and different fertilizers. 

(ii) The gains of weight of animals for different rations. 

(iii) The effects of different temperatures on the plating of steel wire. 

(iv) The effects of varying periods of cold storage on the tenderness and 
flavor of beef. 

(v) The toxicity of various chemicals on insects. 

In the discussion which follows, the word treatments will be used to 
apply to the varieties, fertilizers, rations, or chemicals used in the experi¬ 
ment. And the word yield will be used to describe the result of . the 
experiment, whether it be bushels of oats, pounds of grain, or per cent 
of insects killed by a spray. It is further assumed that the yields were 
obtained by applying the treatments to a variety of experimental units, 
such as plots of ground, animals, or pieces of wire. 

When the experiment is set up, some restrictions may be placed on the 
assignment of the treatments to the experimental units. These restric¬ 
tions are included in the general topic of experimental design and are 
extensively discussed in references 1 to 4. Four important types of 
designs will be discussed in the next two chapters. 

(i) Completely randomized designs. No restrictions are placed on the 
assignment of the treatments to the experimental units, except that each 
treatment shall be used a specified number of times. 

(ii) Randomized complete-blocks designs. The experimental units are 
divided into relatively homogeneous sets, called blocks, the number of 
experimental units per block being equal to the number of treatments 
and each treatment assigned to one experimental unit in each block. 

(iii) Latin-square designs. The experimental units are arranged in 
homogeneous blocks in two directions, called rows and columns, each 
treatment appearing in each row and column once and only once. 

(iv) Incomplete-blocks designs. The number of treatments is greater 
than the number of experimental units which can be grouped in each 
block. 
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The first three designs, which can be classified as complete-blocks designs, 
will be discussed in Chap. 18 and the incomplete-blocks designs in 
Chap. 19. 

Then in Chap. 20 different combinations of treatments will be dis¬ 
cussed as factorial designs . These factorial designs can be used with any 
of the above field designs and are discussed in detail in references 1 2 
and 5. ' ’ 

Borne attention will also be paid to the analysis of data collected in 
sample surveys. The basic data are collected from sampling units, such 
as people or plots of ground; after the sampling units have been con¬ 
tacted, they may be subdivided into subclasses according to some of the 
information secured in the survey. Or the population may be divided 
into classes before the survey is started and a fixed number of sampling 
units contacted in each class. Since most surveys are set up with 
several sources of random variation, they will be discussed in detail in 
Chap. 22. A few examples are included in Sec. 18.2, where the sampling 
design corresponds to a completely randomized experimental design. 
In the discussion which follows we shall refer only to the analysis of 
experimental results, but the reader should understand that survey 
results can be analyzed by similar methods if the assumptions apply. 

When an experimenter sets up an experiment to examine the yielding 
ability of different treatments, he usually desires the following infor¬ 
mation : 

(i) A test of whether there are any real treatment effects. 

(ii) Average yield or effect of each treatment or the difference in yields 
between two different treatments. 

(iii) Confidence limits for these average yields or differences. 

(iv) A measure of the precision of the experiment. 

(v) An estimate of the efficiency of the particular experimental design 
as compared with some alternative design. 

The experimenter should have some ideas concerning the pattern of 
response of the experimental material to the treatments. In order to 
use the method of least squares to analyze the data, we must assume that 
the yield for a given experimental unit can be represented as a linear 
function of the treatment effect, block effect, and error for that particular 
experimental unit, that is, 

Yield = (general mean) + (treatment effect) + (block effect) + (error), 

where the mean measures the average yield for this experimental setup, 
the treatment and block effects add or subtract from the mean depending 
on how they affect the yield, and the error measures the failure of these 
effects to predict the exact yield. The error is a measure of the uncon- 
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trolled sources of variation in the experiment, such as differences in the 
yielding abilities of experimental units in the same block, and errors in 
applying the treatments to the experimental units and in collecting 
the data. 

If the errors in the above experimental model can be assumed to be non- 
correlated and to have the same amount of variability for every treatment 
and block, the method of least squares will give unbiased minimum vari¬ 
ance estimates of the effects. In addition, if the errors are normally 
distributed, the F and t tests can be used to test for the significance of 
treatment differences, and the usual confidence limits for means can be 
used. A more detailed discussion of these assumptions is presented in 
the next section. 

As indicated in previous chapters, a computational device which 
furnishes the requisite data for estimating variances and making tests 
of significance for the method of least squares is the analysis of variance. 
The analysis of variance is a simple arithmetic device for dividing the 
total sum of squares into separate orthogonal parts, and if the least- 
squares assumptions are met, these sums of squares can be used to make 
tests and to set up confidence limits. 

In Chap. 14 it was shown that if we had r fixed variates split into two 
groups of k and (r — k) variates each, the prediction equation was 

k r 

Y — fx + Y PiXi + ^ fcxi + e. 

1 i=k +1 


The analysis of variance was: 


Source of variation 

Degrees of 
freedom 

Sum of 
squares 

Mean square 

First k fixed variates 

k 

Rk 

Rk/k 

Added reduction by last (r — k) fixed 
variates 

r — k 

R r — Rk 

(Rr — Rk) / (r - k) 

Error 

n — r — 1 

SSE 

s 2 = SSE /(n - r - 1) 


We can test the null hypothesis that the last (r — k) (3’s are 0, without 
assuming anything about the first k (3’s, by use of F = (R r — Rk)/(r — k)s 2 , 
with (r — k) and (n — r — 1) degrees of freedom. We generally wish 
to eliminate the effect of the mean from the regression first and do not 
want to test if the mean is 0. Hence everything has been indicated as 
deviations from the mean with y estimated by Y. When it is desired to 
test the null hypothesis that any subset of (r — k) of the (3/ s are all zero, 
we would determine the reduction, R r , due to all r fixed variates and then 
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the reduction R k due to the other k fixed variates alone. The difference 
between the two reductions furnishes the necessary data to test the 
desired hypothesis by use of F, as indicated above. If the two sets of 
fixed variates happen to be orthogonal, then the first k fixed variates 
can be tested by use of F = R k /ks 2 . This merely means that the added 
reduction for these k fixed variates after fitting the last (r — k) fixed 
variates alone is the same as R k or, conversely, that the reduction due to 
fitting the last (r — k) fixed variates alone is the same as R r — R k . 

In a planned experiment, the first set of fixed variates could represent 
block effects and the second set treatment effects. As will be explained 
in more detail in later sections, the X’s are either 1 or 0 for experimental 
data and the regression coefficients are the effects. Many experiments 
are designed so that the block and treatment effects are orthogonal; in 
other words, each part of the analysis of variance can be used to test a 
particular set of effects. When the treatment and block effects are 
orthogonal, the computations are simplified and the estimates of the 
effects are generally more efficient. However, there are circumstances 
for which orthogonality is not possible or advisable. Some of these 
are discussed in Chap. 19. 

The reader is advised to read reference 6 for a discussion of the general 
theory of tests of significance for analysis-of-variance problems and 
reference 7 for the power of these significance tests. In the chapters 
which follow, it will be shown that the expected value of the reduction 
in the total sum of squares attributable to certain treatments used in the 
experiments can be written as 

v 

(V - 1 > 2 + r ^ r?, 

where the p {r, } are the true treatment effects, each treatment being used 
r times in the experiment, and a 2 is the population variance. Tang 7 has 
prepared tables of the Type II error (1 — power) in terms of the treat¬ 
ment and error degrees of freedom and a constant <j>, where 

4> = x^r’Zrf/pa 2 . 

17.2. Assumptions Made in the Experimental Model. Before using 
the analysis of variance to summarize the results of an experiment, it is 
advisable to check the reasonableness of the assumptions which are set 
up. A discussion of these assumptions, the consequences of their not 
being true, and some methods of handling aberrant data are presented 
in the March, 1947, issue of Biometrics A A brief summary of these 
assumptions is presented here, but the reader might prefer to reread this 
section after going over the next two chapters. 
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(i) In order to conned the analysis of variance with the theory of linear 
regression , we must assume that the various fixed effects and the error are 
additive . If the physical make-up of the data or of the operations pro¬ 
ducing the data are of such nature that the effects are really not additive, 
then the sum of squares attributable to such effects by the analysis of 
variance do not represent the true effects. If nonadditive effects are 
actually multiplicative, then we could use the logarithms of the original 
data and have additive effects. Some other type of transformation 
might be useful for other kinds of nonadditivity. Except for the multi¬ 
plicative case, it is usually assumed that the additive assumption is a 
good first approximation to the true relationship and that refinements 
here would produce only minor improvements in the analysis. In 
general, if the treatment and block effects are small, for example, the 
largest mean is not more than 50 per cent greater than the smallest, it is 
doubtful that one needs to worry about nonadditivity. But if there are 
large treatment or block effects, caution should be exercised in the use 
of an additive linear model. With large treatment differences, these 
differences may not be the same from one block to another. This is called 
an interaction between treatments and blocks; if this interaction is also 
additive, a design may be set up to measure it (see Chap. 23). For 
some recent ideas on this subject, see reference 9. 

(ii) In addition , the errors must he noncorrelated. In field experiments, 
it is known that the errors in adjacent plots are usually positively corre¬ 
lated. A device, called randomization , is used to circumvent much of 
this difficulty. By this device, the treatments are allocated at random 
to the experimental units, subject to the design restrictions mentioned 
in the previous section, so that there is an equal chance of any two treat¬ 
ments appearing in adjacent and in nonadjacent plots. Hence the 
expected value of the total error for any one treatment is independent 
of that for any other treatment. It should be emphasized that random¬ 
ization does not remove the correlation between the inherent character¬ 
istics of adjacent plots; rather it provides a mechanism by which this 
expected correlation between two treatments tends to cancel as the 
number of experimental units per treatment is increased. A more 
complete treatment of this subject is given by R. A. Fisher, 2 Cochran, 8 
and Yates. 10 Cochran and Cox 1 (page 8) make the following remark con¬ 
cerning randomization: “Randomization is somewhat analogous to 
insurance, in that it is a precaution against disturbances that may or may 
not occur and that may or may not be serious if they do occur. It is 
generally advisable to take the trouble to randomize even when it is not 
expected that there will be any serious bias from failure to randomize.’’ 

Such data as daily price and production figures cannot fulfill the 



222 


LEAST-SQUARES ANALYSIS 


randomization requirement. Hence it is generally stated that the 
analysis of variance is not a valid method of testing for yearly or monthly 
price differences. To date, the mathematical difficulties of testing for the 
existence of correlations between errors have prevented any definite 
statements on the validity of the analysis of variance for such data. 
Research in the field of serial correlation may lead to an approximate 
solution of this problem. The serial correlation coefficient measures the 
correlation between successive members of a series. See reference 11 for 
some recent articles on this subject. 

(iii) In order to have a simple analysis of variance , it is desirable that 
the errors be the same from one experimental unit to another, regardless of the 
treatment used. Also, if the errors are not equal, it is necessary to know 
the relative sizes of the variances before the experiment is conducted, a 
knowledge which is generally not available. Sometimes the data can be 
split into parts, each part with homogeneous errors, but with unequal 
errors from one part to another. An example of this is presented in 
Chap. 3 of reference 1 and by Cochran. 8 For example, it often happens 
that some experimental units are subjected to a control treatment, which 
often is no treatment at all. If the experimental units are quite variable 
and the treatment is effective in raising the yield, there may be much 
more variability between control yields than between yields from treated 
plots, because the treatments will tend to increase the yields of the poor 
plots more than the good plots (hence decreasing the variability). 

Bartlett 8 discusses various transformations of the original data to 
stabilize the variance when there is a fixed relationship between the mean 
and the variance. For example, we have shown that if a sample is 
drawn from a Poisson population, the mean and the variance will tend to 
be the same. In this case, it can be shown that if the dependent variable 
Z = \/Y is used in the analysis, the variance of Z will be approximately 
the sa me for a ll levels of Z; if there are many small values of Y, use 
Z — \/Y + -g-. Many social, economic, and production data tend to 
have the standard deviation increase with the mean; the variance of the 
transformed variate Z == log Y will tend to be approximately stable. 
However, it should be apparent in all of these cases that if the treatment 
and block differences are small, there is seldom any need for a 
transformation. 

(iv) If it is desired to use the analysis of variance to make the usual tests 
of significance or set up confidence limits, the errors must be normally 
distributed. The principal nonnormal feature to be concerned about is 
skewness. But even this probably does not affect two-tailed tests too 
much. However, there may be serious biases for single-tailed tests or 
confidence limits. Some investigations have been made on the effects 
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of nonnormality on the significance levels and power of tests (see Coch¬ 
ran 8 ), but few general results have been obtained. Perhaps more 
empirical sampling studies should be made. 

17.3. Principles of Setting Up an Experiment. While the statistician 
and experimenter are deciding on a realistic model which can be analyzed 
by available techniques (including transformations, if necessary), the 
details of how the experiment is to be conducted and analyzed should also 
be outlined. 

(i) The experimenter should clearly set forth his objectives before proceeding 
with the experiment. Is this a preliminary experiment to determine the 
future course of experimentation, or is it intended to furnish answers to 
immediate questions? Are the results to be carried into practical use at 
once, or are they to be used to explain aspects of theory not adequately 
understood before? Are you mainly interested in estimates or in tests of 
significance? Over what range of experimental conditions do you wish to 
extend your results? 

(ii) The experiment should be described in detail. The treatments should 
be clearly defined. Is it necessary to use a control treatment in order to 
make comparisons with past results? The size of the experiment should 
be determined. If insufficient funds are available to conduct an experi¬ 
ment from ...which useful results can be obtained, the experiment should 
not be started. And above all, the necessary material to conduct the 
experiment should be available. 

(iii) An outline of the analysis should be drawn up before the experiment 
is started. 

17.4. Methods of Increasing the Accuracy of the Experiment. Accu¬ 
racy refers to the success of estimating the true value of a quantity; it is 
often confused with precision , which refers to the clustering of sample 
values about their own average, which will not be the true value if this 
average is biased. Precision can be thought of as the inverse of variance. 
Hence accuracy is a more inclusive term, since it involves both biasedness 
and precision. Often the experimenter has to choose between an unbiased 
estimate with rather low precision (high variance) and a slightly biased 
one with high precision. The choice of the proper estimate is often 
dictated by circumstances beyond his control, but certain methods of 
increasing the accuracy of the experiment should be kept in mind. If 
there is no bias in the experimental and estimation procedures, accuracy 
and precision are for all practical purposes synonymous. 

(i) Accuracy can generally be increased by increasing the size of the 
experiment. There are certain limitations to this statement, such 
as the fact that increasing size may bring in more heterogeneous 
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material and also may result in a poorer supervision of the experi¬ 
ment with a possible biased result. The latter point is often true 
in industrial and psychological experiments and in sample surveys. 
But in general the accuracy of an estimate increases with increas¬ 
ing size of the experiment. Increasing the number of blocks or 
treatments also furnishes more degrees of freedom for the estimate 
of the experimental error. 

(ii) The experimental techniques should be refined as much as possible. 

(a) There should be a uniform method of applying the treatments 
to the experimental units. 

(b) Sufficient control over external influences should be exercised 
so that every treatment operates under as nearly the same 
conditions as possible. 

(c) Unbiased measures of the treatment effects should be devised 
so that they are fully understood by those running the experi¬ 
ment and by other research workers. As yet we do not have 
good enough measures of such things as socioeconomic status, 
educational progress, and economic conditions to enable us to 
compare adequately the results of one experiment or sample 
with another. 

(d) As far as possible, checks should be set up to avoid gross errors 
in recording and analyzing the data. 

(iii) The experimental material should be selected to suit the experiment. 

(a) The size and shape of the experimental unit should be pre¬ 
pared to achieve maximum accuracy and unbiasedness. 

(b) Often additional measurements can be taken to help explain 
the final results (see Chap. 21 on covariance techniques). 

(c) Finally the treatments should be grouped together in the best 
manner. In other words, the proper selection of the experi¬ 
mental design is of the utmost importance. If there are too 
many treatments or the experimental units are quite hetero¬ 
geneous, an incomplete-blocks design might be used. If 
interactions are important, some type of factorial design is 
needed; if higher-order interactions are not important, some 
system of confounding might be used (see Sec. 20.6). The 
problem of whether to balance the treatments or not must be 
decided on the basis of the importance of various comparisons 
and the amount of experimental material available. 

Anyone who has attended lectures on experimental designs by Prof. 
Gertrude Cox will recognize the origin of many of the above remarks, 
which also have been included in reference 1. 
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CHAPTER 18 

THE ANALYSIS OF DESIGNS IN COMPLETE BLOCKS 

18.1. Steps in the Analysis. In the analysis of experimental data, 

we shall generally consider the following steps: ,, 

(i) Construct a regression model which expresses symbolically the 
nature of the variables used in the experiment and the method of con- 

ducting the experiment. . . , , 

(ii) Determine estimates of the parameters in the regression model by 

ieastiSquares^te ^ reduction in the tota l variation due to the regression 
in an analysis of variance table. 

(iv) Compute s 2 and the necessary F values. , 

(v) Compute variances of the estimates in (ii) and confidence limits. 

(vi) If possible, compare the error variance using this design with the 
expected variance if some other design had been used—this is called the 

efficiency of the design. . 

(vii) Present actual examples with a discussion of the results. 

18 2 Completely Randomized Design, (i) Suppose we have p dii- 
ferent treatments which are to be tested for yield-producing ability Let 
us assume that each treatment is planted on r different experimental units 
which we shall call plots, r is often called the number of replications for 
each treatment. Then we might estimate the yield for the rth treatment 
on the jth plot of its r plots as 


p 

Yu = n + } TkXki + 

jfc»i 


6ij } 


where 

X ki 


0 forfc^i] i= J 2, ... ,p;j = 1,2, ... , f| 

1 for 7c — i 


n is the differential effect of the ith treatment (over the mean, m) and 
VV Tk x u = 0. Some restriction such as the last one is necessary, since 


LjLj 

there will be only p treatment totals to estimate (p + 1) constants (m and 
the p Ps). If the restriction mentioned above (SSrtlw - 0) is use , y 
xxrUi Kp nver-all mean effect. 
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least-squares analysis 


Since the X’s are 0 or 1, the regression model can be simplified to 

Yij ~ M + Ti -f €ijj 

As mentioned in Chap. 17, it is assumed that the 


where ^ Ti = 0. 


€ij are 

•t 

mDtfV) and independent of the n , Hence the expected mean yield for 
e ith treatment is p + n . This assumes that there is no interaction 
between the treatment and the error, that the treatment and error are 
truly additive; otherwise, the difference between Y and p + r- on a 
particular plot would not be the same for all treatments. The assumption 
that all. have the same variance, <r 2 , enables us to estimate <r 2 even though 
we have but one observation per plot. Actually it is assumed that the 
possible errors in repeated sampling at one plot are N(0y), and we would 
like to estimate <r by repeated sampl ing at this plot. But it is impossible 
m most experiments to do this (and still maintain the same other condi¬ 
tions), hence we estimate <r 2 from the plot-to-plot variation of a single 
experiment. In order to do this, we must assume all e’s have the same <r 2 . 

One method of distinguishing between the r’s and the e’s has been to 

V , 1 T */T d CffeCtS and £ ’ S random e ff ects - B y Axed effects, we imply 
that all of the treatments about which inferences are to be made are 
included m the experiment. A random effect is assumed to be one of a 
laige number of possible effects; in general, we shall regard the number of 
possible effects to be infinite. However, it is not too difficult to set up the 
theory corresponding to the assumption that the number of possible ran¬ 
dom effects is a known finite number (greater than the number in the 
sample). In the model used here, the e’s represent the differences in the 
yielding abilities of the various plots, plus errors made in applying the 
treatments to the plots, in measuring the size of the plots, and even in 
measuring the actual yields produced. If the e’s represented only the 
differences m the yielding ability of the plots, one might believe that the 
number of e s would be fixed; however, the other components of the e’s are 
certamly unlimited m number. But we do not feel that even the yielding 
a 1 1 y ° a given plot is fixed, because this will change with time and will 
e influenced at a given time by particular environmental conditions 
which affect that plot and not the others, such as weather for crops and 
social and psychological factors with people. Hence, on all counts it 

seems reasonable to regard the plot errors (e) as random variables’ or 

variates. 

(u) We wish to estimate the values of p, {t,-}, and a 2 from an experi- 
ment which has produced r yields for each of the p treatments, f The 
method of experimentation is to select rp plots in the field and then to 
t Or we may consider a survey with p classes and r samples from each class. 
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allocate p treatments at random on these plots, with the restriction that 
each treatment should appear on r plots, f Let us designate the estimates 
of n and t{ as m and U. Hence the regression equation can be set up in 
the form 

Yij 7YI ~j~ “h &ij) 

where the residual e { j is the estimate of m and the {£*} are to be 
chosen so as to minimize Se 2 . 

The normal equations are 

m'.rpm + r'LU = SY — G 

t%: rm + rti = SY i3 = T i} i = 1, 2, . . . , p, 

3 

where T* is the total yield for the fth treatment and G the grand total for 
the experiment and where the m equation is found by differentiating Se 2 
with respect to m, and similarly for U. If the U equations are added 
together, this sum is exactly the m equation. This shows that the 
p-treatment equations are not independent of the equation for m. In 
order to solve these equations, an auxiliary relationship must be used. 
Since 2 t; = 0 , a reasonable relationship is 

2 ti = 0, 

so that each U measures a differential effect from the mean. Also, this 
makes m = G/rp = Y, the sample mean. And 

U = (Ti/r) - m = (Ti - T)/r, 

where T = G/p. If the average yield for the ith treatment is desired, we 
calculate 

t'i = k + m = Ti/r. 

A simple method of setting up the normal equations is to write down 
the estimated value for each observation and then obtain the estimating 
equation for each effect by adding together all observation equations 
which contain this effect. If the experiment is designed so that the 
various sets of effects are orthogonal, the resultant equation will be a 

f The randomization feature is needed to assure that each treatment has the same 
chance of appearing on a given plot. As pointed out in Sec. 17.2, randomization also 
helps to validate the assumption of noncorrelated errors. The errors for adjacent 
plots are often positively correlated. However, when we consider all possible ran¬ 
domizations, any two e's will apply equally often to adjacent and to distant plots. 
Hence, if the errors tend to be as negatively correlated for distant plots as they are 
positively correlated for adjacent plots (a rather reasonable assumption in many 
experiments), the average correlation between these two e’s should be approximately 
zero. And regardless of the plot-to-plot correlation pattern, randomization should 
tend to minimize the effect of this correlation on the errors in the model. 
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function of only the effect under consideration (plus the mean); otherwise 
there will be a mixture of effects, necessitating the simultaneous solution 
of several equations to estimate the effects (as was done in Chap. 15). 
For our completely randomized design, the observation equations 
involving ti are: 


Yield 

Effects 


In 

m -f- t\ = 


f 12 

m + ti = 


Y i r 

m + ti — 

A. 

Ti = SY U 

r{m + ti) = 

rt\ 


3 


Hence t[ is simply Ti/r. 

Since m appears in all of the observation equations, we must add all of 
them together to compute m, giving us 

SY = rpm + r(2ii) = rpm. 


Hence m — SY/rp = F, 


assuming 


i 


ti = 0 . 


(iii) The error sum of squares is 


SSE = S(Yij 

ij 


m — ti) 2 = S 

ij 

P 

l n 

= SYl - j- 1 — = Syl 


!(--?)' 


- (? - 4 


where C = G 2 /rp. Hence the reduction in sum of squares due to treat¬ 
ments, called the treatment sum of squares (and indicated by SST), is 


SST = —i — c. 
r 


Another method of deriving this result is to use the abbreviated 
Doolittle results. If we eliminate m from the L equation, we have 

rU = Ti - G/p = Ti - f. 

Hence the A iy value is Ti — T, and the B iy value is (Ti — T)/r. There- 

p p 

fore SST = £ A iu B iy = £ (T t - fy/r = (XTf/r) - C. This simple 

i= 1 i=l 

result is obtained because the fth equation contains ^ and no other t. 
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Expressed in terms of the original model, 

(r W + r £r, + JJ**) 

Ti — T — (rn + rri + ^ - 


x 3 


\ r / j = l r j 


and 


E(Ti — T) = r{n - f). 

But it has been shown that ^ r, = pf = 0 is a necessary part of the 

i 

original model. Therefore 

E[(T, - m = rV| + [(—“- 1 ) 2 r + Q 2 r(p - 1) 

= j-V? + 

and 


.2.2 I KP - e .2 


#(SST) = r £ T? + (p - 1) <r 2 . 

i = 1 


The analysis-of-variance table is: 


! 

Source of 
variation 

Degrees of 
freedom 

Sum of 
squares 

Mean square 

E(MS) f 

Treatments 

v - i 

SST 

MST = SST/(p - 1) 

o 2 + »■ ^ r i /(p “ o 

Error 

P(r - 1) 

SSE 

s 2 = SSE/p (r - 1) 

i 

cr 2 


f E{ MS) = expected value of the mean square. 


(iv) From Chap. 14, we know that SSE is distributed as with 
n — p = rp — p — p(r — 1) degrees of freedom. If we set 

s 2 = SSE/p(r - 1), 

E(s 2 ) = a 2 . Hence we can use E = MST/s 2 to test H 0 : {r,- = 0} against 
the alternative that some t* ^ 0. The one-tailed F test is used since the 
numerator of F is expected to be greater than the denominator when any 

Ti 9^ 0. 

(v) Since - T { /r = (r/x + rr* + ^ e^/r and m - G/rp — \i + €, 

3 

it is seen that 

E(ti) = /x + n, E(m) = /x, 
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Also for any other treatment t[ = Ti/r, and 

*Wi) = o. 

The estimated difference between the mean effects of these two treat¬ 
ments is 

d = L — ti = t'i — t[, 

E(d) = 8 = ^ — n. 

Since t\ and t[ are noncorrelated, <r 2 (d) = 2c r 2 /r and the confidence limits 
for 5 are 

d — t a s \/2/r < 5 < d + t a s \/2/r. 

Example 18.1. As a first example, we might consider an experiment 
which was run to investigate the number of warp skein breaks on tent 
twill in 5 consecutive days of testing with 4 breaks per test. 1 The 
results are: 


Day 

Total 






1 

2 

3 

4 

5 


30 

30 

40 

45 

55 


35 

40 

45 

40 

45 


25 

45 

55 

35 

60 


40 

45 

35 

40 

50 


Total 130 

160 

175 

160 

210 

835 

Mean 32.5 

40.0 

43.75 

40.0 

52.5 

41.75 


Analysis of Variance 


Source of 

Degrees of 

Sum of 

Mean 

variation 

freedom 

squares 

square 

Days 

4 

845.00 : 

211.25 

Error 

15 

668.75 

44.58 


The error variance is estimated to be 44.58. There are significant differ¬ 
ences among the day means, since F = 211.25/44.58 = 4.74 with 4 and 
15 degrees of freedom. Also, the standard error of the difference between 
any two day means is given by s(d ) = \/2(44.58)/4 = 4.72. Hence 
the 95 per cent confidence limits for the difference between the mean 
number of breaks for the first two days are 

7.5 - 10.1 < 5 < 7.5 + 10.1 or -2.6 < 8 < 17.6, 

where 10.1 = ts(d) = (2.13) (4.72), t with 15 degrees of freedom. This 
example will be discussed further in Exercise 18.2. (See Fig. 18.1 for a 
graphical presentation.) 
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1 2 

3 

4 

5 

Fig. 18.1. Number of warp skein breaks 

in 5 consecutive days of testing. 

Example 18 . 2 . A second ex 

ample presents an 

analysis of the gains in 

weight (grams per 

100 days) 

of rats 

on a stock ration with various 

amounts of gossypol added (Halverson 

and Sherwood, North Carolina 

State College, 1932), 

as shown 

in Table 18.1. 




Table 18.1 



Amounts of gossypol added 





— 

—:-: - Total 

None 

.04% 

.07% 

.10% 

.13% 

228 

186 

179 

130 

154 

229 

229 

193 

87 

130 

218 

220 

183 

135 

130 

216 

208 

180 

116 

118 

224 

228 

143 

118 

118 

208 

198 

204 

165 

104 

235 

222 

114 

151 

112 

229 

273 

188 

59 

134 

233 

216 

178 

126 

98 

219 

198 

134 

64 

100 

224 

213 

208 

78 

104 

220 


196 

94 


232 



150 


200 



160 


208 



122 


232 



110 





178 

! 

No. of rats 16 

11 

12 

17 

11 67 

Sum 3,555 

2,391 

2,100 

2,043 

1,302 11,391 

Mean 222.2 

217.4 

175.1 

120.2 

118.4 170.0 






234 


LEAST-SQUARES ANALYSIS 


There is an added complication in the analysis of these data, because 
the number of rats varied from treatment to treatment. However, the 
analysis of variance proceeds quite simply, as follows: 


C = 


(11,391) 2 
67 


1,936,640. 


Sums of Squares 


Total: (228) 2 + (229) 2 + • • ■ + (104) 2 - C = 178,985. 


Rations: 


(3,555) 2 , (2,391) 2 , (2,100) 2 , (2,043) 2 , (1,302) 2 „ 

16 11 + 12 + 17 ' + TT C 

= 140,083. 


Source of 

Degrees of 

Sum of 

Mean 

variation 

freedom 

squares 

square 

Rations 

4 

140,083 

35,020.75 

Error 

62 

38,902 

627.45 


In this experiment, s 2 = 627.45, F = 55.8, a highly significant value 
(a < .01). See Exercise 18.3 for more details. 

18.3. Randomized Complete Blocks, (i) Suppose that the rp plots 
are divided into r blocks of p plots each and that each treatment is 
assigned at random to one of these plots in each block. Then the 
estimation equation is 


Yij — M + Ti + fa + €ij — m + L + bj + eijj 


where fa is the differential effect of the;th block (j — 1,2, . . . ,r) being 


estimated 



In this case we assume that the block 


effects are also fixed—that inferences are to be made for these particular 
p treatments applied only to these r blocks. Since most experimentation 
is set up to make inferences over a wider variety of experimental condi¬ 
tions than we have in the particular blocks used in a given experiment, the 
blocks usually are regarded as representative of a population which is 
larger than the sample. The analysis of data under these assumptions is 
not too different from that presented in this section; this problem will be 
discussed in Chap. 23. It is further assumed that there is no real treat¬ 
ment by block interaction. A discussion of the problem when there is 
interaction (that is, treatment effects are not constant from block to 
block) also will be presented in Chap. 23. (See the footnote on page 
229 for a discussion of the need for randomization.) The equations 
for a randomized complete-blocks experiment with 3 treatments and 4 
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blocks are indicated in Fig. 18.2. The treatments (1,2,3) were assigned 
to the plots at random, the random arrangement being (1,3,2), (3,2,1), 
(2,1,3), and (3,1,2). 


Block 1 Block 2 




m+ t t +bj +e (j ==/* + Ti 

Fig. 18.2. An example of the model equations for a randomized complete-blocks 
design (r = 4, p == 3). 

(ii) The least-squares equations are 

m: rpm + + pXbj — G, 

U: r(m + U) + 26y = T iy 
bji p(m + by) + HU == Bj, 

where Bj is the total yield for the jth. block and the b 3 - equation is found by 
minimizing Se 2 with respect to bj. In this case, two auxiliary equations 
are required; in order to make m = Y = G/rp, an unbiased estimate of 
jit, w T e set HU = 2?),- = 0. Hence U = (Ti — T)/r, and bj = (Bj — B)/p, 
where T — G/p and B — G/r. We let 

= U + m — Ti/r ; b' = bj + m = Bj/p. 

(iii) Using the abbreviated Doolittle approach, we see that the t 
equations are the same as in Sec. 18.2 (iii) and that 

pbj = Bj - B. 

Hence the reduction in sum of squares due to blocks and treatments is 
£ (Ti - Ty 

i _ 

r 

Therefore the added reduction due to blocks alone (indicated by SSB) 
must be 

J w - 5 ) 2 

SSB = J- - 

V 


> (Bi - By 
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We note that 

~ 2 S €i ^ Pt 

j i j 

e ij ^ ^ e ij/?' 
i i j 

Hence E(ti) = Ti and E(bj) = An Furthermore, it can be shown that 
<r[(Ti — T)(Bj — B)] = 0. This also proves that the treatment and 
block effects are noncorrelated; hence, SST and SSB are orthogonal parts 
of the total reduction (the total reduction equals SST plus SSB). 

The analysis of variance table is: 


pbj = Bj - B = p(3j + V 



Source of 
variation 

Degrees of 
freedom 

Sum of 
squares 

Mean 

square 

E( MS) 

Blocks 

r - 1 

SSB 

MSB 

o- 2 + J»2J/S}/(r - 1) 

Treatments 

V - i 

SST 

MST 

o- 2 + rSr?/(p - 1) 

Error 

(r ~ 1 )(V ~ 1) 

SSE 

s 2 

cr 2 


(iv) s 2 = SSE/(r — 1 )(p — 1), and F = MST/s 2 to test for over-all 
treatment differences. Note that SSE = Sy 2 — SST — SSB. There is 
usually no reason to test for block differences, since the blocks are generally 
chosen to be different. 

(v) As in 18.2, c r 2 (t') = <r 2 /r, and <r 2 (d) = 2 a 2 /r. 

(vi) The experimenter often is interested in estimating how much he 
reduced his estimated error variance by the imposition of the block 
restrictions on the design. In other words, how much would he have 
increased his estimate of the error variance if the treatments had been 
randomly distributed over all rp plots, instead of being randomly dis¬ 
tributed over the p plots of each of r blocks? Let us call si the error 
variance for the completely randomized design and s% that for the 
randomized complete-blocks design. Then the estimated efficiency of 
the latter design relative to the former is given by 

I = sl/s%. 

Since the estimated variance of the difference between two treatment 
means is 

s 2 (d) = 2s 2 /r, 

rl measures the number of replications needed for a completely random¬ 
ized design to give the same value of s 2 (d) as r replications in a randomized 
complete-blocks design. 
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Since we have only the analysis of variance for the randomized com¬ 
plete-blocks design, it is necessary to use these data to estimate s 2 . This 
estimate is as follows: 

2 s rb (error d.f. -f- treatment d.f.) + SSB _ r(p — 1 )s% + SSB ^ 

Sw error d.f. ~b treatment d.f. + block d.f. rp — 1 

where d.f. stands for “degrees of freedom.” We shall derive this formula 
by considering a uniformity trial, in which all the treatments are alike. 
In this case si is the total sum of squares for a completely randomized 
design divided by (rp — 1). But since we are concerned with the same 
field as with our randomized complete-blocks design, SSB and SSE will 
remain unchanged (on the average for all possible randomizations). 
Since we are considering a uniformity trial, the treatment sum of squares 
will be an estimate of (p — l)cr 2 ; hence, the best estimate of this treatment 
sum of squares will be (p — l)s%. Adding these together and dividing 
by (rp — 1), we find that 

2 - SSE + (p - 1)8% + SSB = r(p - l)s% + SSB 
Sw (rp — 1) (rp — 1) 

We note that if there are no block effects, E(sl) = a 2 ; however, if there 
are block effects, E(sl ) > cr 2 . 

At first glance one might think that, since the only difference between 
the two designs is that the block effects are included in s 2 and not in $$ bt 
the best estimate of sj would be (SSB + SSE)/p(r — 1). However, 
since the total sum of squares is assumed the same for both designs, this 
would assume (falsely) that SST is jthe same for both designs. Cochran 
and Cox 4 present a more rigorous proof, which avoids the use of the uni¬ 
formity device by estimating the average value of SST for both designs. 

Actually the efficiency is slightly less than 7, because there is a loss in 
the number of error degrees of freedom when the block sum of squares is 
removed from the error. Fisher 6 calculates the amount of information 
which d furnishes concerning <5, the true difference between two means. 
If o- 2 were known, the amount of information would be proportional to 1 /a 2 ; 
with o- 2 estimated by s 2 , the amount of information is (n + 1 )/(n + 3)s 2 , 
where there are n degrees of freedom for the error mean square. Hence if 
there are n x degrees of freedom for one design with error variance sf and n 2 
degrees of freedom for a second design with error variance si, the efficiency 
of the first design relative to the second is 

(ni + 1 )(n 2 + 3)s| 

(ni + 3 )(n 2 + l)s 2 
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Since we have set sf/sf = 7, I would be multiplied by 


h = ( n i + 1 )( n2 + 3 ) 

(n i + 3 )(n 2 + 1) 

to adjust for the loss in degrees of freedom. 

Cochran and Cox 4 (pages 27 to 28) discuss several other methods of 
adjusting for the loss of degrees of freedom. One might consider yet 
another adjusting constant 

h' = Fa ^ p ~ n ^ 

Fdip ~ 1), m]’ 

where there are (p — 1) degrees of freedom for treatments and n 2 and th 
respective degrees of freedom for error. If testing is done at the a = .05 
significance level, F . 0 b should be used. 

Example 18.3. This example presents the analysis of the differences 
between fat absorption by doughnuts in 8 different fats, all being tested 
on each of 6 days. 


Grams of Fat Absorbed by Mixes of 24 Doughnuts during Cooking Period 


Day (Block) 

Fat (treatment) number 

Total 

1 

2 

3 

4 

5 

6 

7 

8 

1 

164 

172 

177 

178 

163 

163 

150 

164 

1,331 

2 

177 

197 

184 

196 

177 

193 

179 

169 

1,472 

3 

168 

167 

187 

177 

144 

176 

146 

155 

1,320 

4 

156 

161 

169 

181 

165 

172 

141 

149 

1,294 

5 

172 

180 

179 

184 

166 

176 

169 

170 

1,396 

6 

195 

190 

197 

191 

178 

178 

183 

167 

1,479 

Total 

1,032 

1,067 

1,093 

1,107 

993 

1,058 

968 

974 

8,292 

Mean 

172.0 

177.8 

182.2 

184.5 

165.5 

176.3 

161.3 

162.3 

172.8 


Computations 


C = 
Sy> = 
SST = 


(8,292) 2 
48 


= 1,432,443, 


(164) 2 + (172) 2 + • • 
(1,032) 2 + (1,067) 2 + 


+ (167) 2 - 
• • + (974) 


1,432,443 = 9,143.00, 

■ - 1,432,443 = 3,344.33, 


SSB = ( 1 ? 331 ) 2 + (!472) 2 + • • • + (1,479) 2 

8 

SBE = Sy 2 - SST - SSB = 1,811.92. 


1,432,443 = 3,986.75, 
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Source of variation 

Degrees of 

Sum of 

Mean 

freedom 

squares 

square 

Between day means 
Between fat means 

5 

7 

3,986.75 

3,344.33 

797.35 
j 477.76** 

Error 

35 

1,811.92 

; 51.77 = s 2 


We note that the efficiency of this design as compared with the com¬ 
pletely randomized design is 

_ 42(51.77) + 3,986.75 _ 6, 161.09 = 9 

1 “ 47(51.77) 2,433.19 

In this case h = (36)(43)/(38)(41) - .994, which is almost unity V is 
slightly less. This indicates that if the fats had been distributed ban- 
domly over the 6 days, the error variance would have been expected to be 
about 2.5 times as large as that with the randomized blocks design where 
each fat was used each day. In other words, we have made our estimates 
of fat differences about 2| times as precise by planning the experiment so 
that every fat was used on each day of the experiment 

The 95 per cent confidence limits for the difference between any two 
fat means are (d - 8.43 < S < d + 8.43). A word of caution about the 
use of these confidence limits should be given. These are average con¬ 
fidence limits, assuming that you select the treatments to be compared 
in advance of the experiment. If you wait until the experiment is over 
and then select, for example, the highest and lowest treatment' 
compare, the confidence limits are much wider than [d ± U(d)]. One 
excellent article on this point is presented by J. W. iukey. K. A. 
Fisher 6 (page 58) advocates the following approximate procedure for 
testing the largest against the smallest in a set of p means: Compute t e 
significance probability by the ordinary t test, and multiply this by 
v ( v _ i)/2. pip — l)/2 is the number of ways of drawing 2 from p 
means, and we are selecting the most extreme of these combinations; 
since the probability of drawing this extreme set is 2 /p(p 1), it seem 
reasonable to divide the probability of obtaining such a difference (by use 
of the t test) on the average by the probability of drawing the extreme 

S6 18.4. Latin-square Designs. A design slightly more complicated 
than the randomized complete-blocks design is the Latin-square design. 
In this design, each treatment is assigned at random within a row an a 
column so that all treatments appear in each row and column. Hence 1 
is possible to adjust the error for plot heterogeneity m two directions, 
course the rows may be fields and the columns similar locations in these 
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fields. The basic design looks as follows for 3 treatments (there then 
must be 3 rows and 3 columns): 


Ox 

c 2 

c 3 

Total 

T 7 1 (23) 

T 2 ( 17) 

29) 

69 

T 2 (16) 

T 3 ( 25) 

T 7 ! (16) 

57 

T,{ 24) 

Tx(18) 

T 2 (12) 

54 

63 

60 

57 

180 


The figures represent yields in an experiment. In the field the rows and 
columns of the basic design would have been randomized. One such field 
arrangement would be: 


Ri T s T t T 2 

Ri T 2 T 3 T x 

R* TI T* T» 

In general it is also advisable to assign the treatment numbers at random 
to the treatments. 

The regression model for this design is: 

Yijk = M + ai + yj + Tkaj) + e u ,, 

W ff he ? “ w nd 7 ' are fixed r0W and ° 0lumn effeots and the fixed treatment 
ettect. We note that, once i and j are specified in a particular field 
arrangement, we know k. Hence Hs a function of i and j. For example 
m the field arrangement mentioned above, if % — 2 and j = 3, k = 1 
The anaiysis-of-variance table is similar to that of the randomized com¬ 
plete-blocks design except that there are two sets of blocks—rows and 
columns. The analysis of the 3X3 example is as follows: 


Degrees of I Sum of Mean 
freedom squares square 


Source of 
variation 

Rows 

Columns 

Treatments 

Error 


The F value to test for treatment differences is - 9 / = 31 with (2 2) degrees 
of freedom, which is significant at the a = .05 significance level. 

.5. Summary. In order to help the experimenter decide what 
design to use, we are including a summary (Table 18.2) of some of the 

advantages and disadvantages of the three designs discussed in this 
chapter. 
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Table 18.2 


Feature of the 
experiment 

Completely 

randomized 

Randomized 
complete blocks 

Latin square 

Replications per treat- 

Need not be the 

Same for all treat- 

Same for all treat- 

ment 

same 

ments 

ments 

Number of treatments 

Unlimited, except 

If too many, may 

Usually between 5 

(p) 

for plot variabil¬ 
ity 

lose advantage of 
blocks 

and 10, because 
r — v 

Ease of setting up ex¬ 
periment 

Easy 

Fairly easy, but 
must set up blocks 
of required size 

More difficult than 
other two 

Error degrees of free¬ 
dom 

Maximum number 

Lose blocks 

Lose both rows and 
columns 

High plot variability 

Quite bad 

Takes care of varia¬ 
bility in one direc¬ 
tion 

Very good if varia¬ 
bility in two direc¬ 
tions 

Test of unequal error 
variance 

Easy 

Difficult, but often 
can pull out sets of 
error d.f. 

Very difficult 

Missing data 

No difficulty 

Not too bad, but 
some loss of effi¬ 
ciency 

Quite bad if several 
missing plots 

Effect on analysis of 
error in assignment 
of treatments to 
plots 

Minor 

May require more 
complicated analy¬ 
sis 

May be badly upset 

!| 


EXERCISES 

18 . 1 . Assume you have Vi samples for the ith treatment, (a) Show 
that you can obtain an estimate of a 2 from each treatment with (n — 1) 
degrees of freedom. ( b ) How can you use this result to test the assump¬ 
tion of equal variances? 

18 . 2 . (a) In Example 18.1, use the method of orthogonal polynomials, 
outlined in Chap. 16, to divide the sum of squares for days (SSD) 
into four orthogonal parts, each with 1 degree of freedom (linear, 
quadratic, cubic, and quartic). For example, the polynomial values 
(P[) for the linear ^effect are (-2,-1,0,1,2). Consider the totals as 
the Y values, and show that SP[Y = 160. It has been proved that 
SSRi - (SP'.Yy/SiP'd 2 = 25,600/10 = 2,560, but this is larger than 
SSD. Under the null hypothesis of no day differences, show that 
F(SSRi) = 4<r 2 . Hence to put SSRi on the same basis as s 2 (wdiose 
expected value is a 2 ), we must divide SSRi by r = 4, so that the linear 
effect is 2,560/4 - 640. (See Fig. 18.1.) 

(i b ) Complete the analysis of the quadratic, cubic, and quartic effects. 
Is there a significant departure from linearity ? Hence what do you con- 








242 


LEAST-SQUARES ANALYSIS 


elude regarding the day-to-day fluctuations in the quality of the product? 

(c) Suppose we had used the means as the Y values. What changes 
must be made in the above analyses? 

(d) Can you make any general statement of how to analyze by orthog¬ 
onal polynomials p equally spaced sets of observations with r observa¬ 
tions at each point? 

18.3. (a) Set up the mathematical model for Example 18.2. Solve the 
normal equations, and show that the analysis presented there is correct. 
Show how the observation equations can be used to estimate the effect of 
the .04 per cent treatment. 

(b) Prove that the standard error of the difference between the effects 


of 


any two treatments is 5 



where r\ and r 2 are the numbers 


of rats on the two rations. Hence show that the standard error of the 
difference between the gains for rations no Gossypol and .04 per cent is 
9.81. 


(c) What is the standard error of the average of the first two against 
the third ration? The third against the last two? 

(d) How would you test for departure from linearity in this case? 
Graph these results. 

(e) Is there any evidence that the error variance is not the same for all 
treatments? 

18.4. An investigation of North Carolina farmers’ retail produce mar¬ 
kets was made in 1948. 2 Data were collected on the dollar value of live¬ 
stock owned by a sample of sellers on these markets, shown in the accom¬ 
panying table. 


Data of Exercise 18.4 


Market 


Total 
No. sellers 
Mean 


1 2 

3 

4 

5 

6 

7 8 

9 

10 

11 

12 

13 

721 750 3,480 

627 

469 

812 

393 369 

332 

249 

1,106 

1,703 

1,462 

64 756 

293 

169 

604 

271 

841 785 

842 

371 

1,702 

563 

1,088 

664 1,315 

370 

976 

165 

1,100 

426 655 


361 

1,154 

714 

273 

134 293 

284 

109 


305 

1,947 


1,061 

962 

908 

1,080 

610 865 

119 

704 





359 


97 

2,195 

546 3,274 

1,980 

1,557 





1,026 


442 

428 

278 


697 





477 



310 

29 










295 











178 

3,046 7,253 

6,526 

4,839 

1,238 

2,488 

1,660 3,756 

1,174 

3,904 

4,924 

4,427 

7,309 i 

8 6 

6 

7 

3 

4 

3 4 

2 

7 

4 

6 

9 

381 1,209 

1,088 

691 

413 

622 

553 939 

587 

558 

1,231 

738 

812 


Total 


69 

761.5 
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(a) Analyze these data to see whether or not there are any real differ¬ 
ences among markets. 

(b) Compare the mean value per seller for the first nine markets with 

the mean value for markets 10 and 11. 

(c) Is there any evidence of unequal variation for these data ? 

18.5. Snedecor and Cox 3 analyzed some data on the gain in weight 
(grams) of 149 Wistar rats over a 6 weeks’ period for 4 successive genera¬ 
tions. The gains for the males and females in each of these generations 


were: 



Male 

Female 

Total 

Generation 

12 3 4 

12 3 4 

No. of rats 
Total gain 
Mean gain 

21 15 12 7 
3,716 2,422 1,868 1,197 
176.95 161.47 155.67 171.00 

27 25 23 19 
2,957 2,852 2,496 2,029 
109.52 114.08 108.52 106.79 

149 
19,537 
131.12 


The total sum of squares adjusted for the mean was Sy 2 = 176,836. 

(a) Set up the analysis of variance among and within the 8 classes and 
determine whether or not there are significant differences among the 8 
mean gains. 

(b) What is the standard error to test for the difference between the 
mean gain for the males of a given generation and the mean gain for the 
females of the same generation? What would you say regarding the 
differences between the population means for each of the 4 generations? 

(c) Can you make a statement regarding the difference between males 
and females over all 4 generations? 

18.6. The relationship between the vitamin B 2 content of turnip greens 
and the average soil moisture tension (Chap. 15) can be analyzed as a 
completely randomized design with p = 3 and r = 9. 

(a) Set up the analysis of variance of the vitamin B 2 content with the 
three soil moistures as treatments. 

(b) Test whether or not there was a significant departure from linearity 
in this case. 

(c) Is there any evidence that the error variance is not the same for the 
three treatments? 

18.7. Derive the observation equations for h and b i, and show how 
they can be used to obtain the normal equations (Sec. 18.3). 

18.8. (a) Set up another randomization plan for Fig. 18.2, and write 
down the model equations. How many different randomizations can be 
devised for 4 blocks and 3 treatments? 

( b ) Draw up a randomization plan for Example 18.3, 
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18.9. In the investigation of farmers’ markets (Exercise 18.4), the 
incomes of a sample of regular, former, and potential patrons were studied 
at 10 markets with the results given in the accompanying table. 


Data of Exercise 18.9 
Average Monthly Incomes ($10) 


City 

Regular 

Former 

Potential 

Total 

Asheville f 

30 

30 

28 

88 

Asheboro 

39 

36 

27 

102 

Charlottef 

41 

37 

28 

106 

Durham f 

29 

29 

24 

82 

Greensboro f 

32 

27 

22 

81 

Raleigh f 

29 

27 

29 

85 

Goldsboro 

25 

24 

20 

69 

Rocky Mount 

27 

28 

27 

82 

Franklin 

30 

29 

24 

83 

Jacksonville 

29 

28 

27 

84 

Total 

311 

295 

256 

862 


f Cities of over 50,000 population. 

(a) Would you conclude that there are any real income differences 
among the three groups of people? 

(b) What are the confidence limits for the difference between the aver¬ 
age income of the regular and potential patrons? 

(c) Is there any feature of this analysis which might be open to question ? 

t An analysis was made of the fiber diameters in microns at 6 

different regions on the seed coat of the Mexican 128 variety of cotton. 7 
The analysis was made on a sample of 10 seeds as shown in the accom¬ 
panying table. 


Data of Exercise 18.10 


Plant 

Region 

Total 

1 

2 

3 

4 ' 

5 

6 

A 

16.49 

17.80 

17.54 

16.75 

17.54 

17.54 

103.66 

B 

15.45 

15.96 

15.71 

14.13 

14.40 

14.40 

90.05 

■ C 

16.23 

15.96 

16.49 

14.92 

14.66 

14.92 

93.18 

D 

18.33 

17.28 

16.49 

16.49 

17.28 

17.80 

103.67 

E 

16.49 

18.33 

17.54 

17.02 

17.28 

18.06 

104.72 

F 

16.49 

17.54 

17.05 

15.71 

15.45 

14.66 

96.90 

O 

15.96 

15.71 

16.23 

16.49 

15.18 

16.49 

96.06 

H 

16.75 

16.23 

14.66 

15.96 

13.35 

16.75 

93.70 

I 

14.40 

18.33 

17.02 

14.66 

15.71 

17.02 

97.14 

J 

16.49 

17.02 

3.6.75 

3.7.54 

35.73 

s 16.49 

100.00 

Total 

163.08 

170.16 

165.48 

159.67 

156.56 

164.13 

979.08 
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(a) Show that the analysis of variance is correct, and fill in the degrees 
of freedom. 


Source of 

Degrees of 

Mean 

variation 

freedom 

square 

Plants 


4.15 

Regions 


2.22 

Error 


.692 


(6) What statement would you make regarding region differences? 

(c) What is the standard error of a region mean? 

(d) What are the confidence limits for the difference between regions 1 

and 6? 

18.11. Middleton and Chapman conducted an experiment to determine 
the best variety out of eight varieties of oats at Laurinburg, N.C., in 
1940. The yields of grain in grams for a 16-foot row were as shown in 
the accompanying table. 


Data of Exercise 18.11 


Variety 

Blocks j 

Sum 

Mean 

I 

II 

III 

IV 

V 

1 

296 

357 

340 

331 

348 

1,672 

334.4 

2 

402 

390 

431 

340 

320 

1,883 

376.6 

3 

437 

334 

426 

320 

296 

1,813 

362.6 

4 

303 

319 

310 

260 

242 

1,434 

286.8 

5 

469 

405 

442 

487 

394 

2,197 

439.4 

6 

345 

342 

358 

300 

308 

1,653 

330.6 

7 

324 

339 

357 

352 

220 

1,592 

318.4 

8 

488 

374 

401 

338 

320 

1,921 

384.2 

Sum 

3,064 

2,860 

3,065 

2,728 

2,448 

14,165 

354.1 

l! 


(а) Determine whether or not there are significant differences among 
the varieties. Which variety would you recommend? 

(б) What is the efficiency of this design compared with a completely 

randomized design? 

(c) What is the standard error for the difference between two varietal 
means ? 

(d) Bet up a field plan for conducting this experiment. 

18.12. Frequently one or more plots in a randomized blocks experiment 
will be missing because of some adverse condition such as a washout or 
insect infestation. As a result the orthogonal properties of the analysis 
are upset, and the statistician must either use an approximate analysis or 
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run a complete least-squares analysis. Let us assume in the oats experi¬ 
ment of Exercise 18.11 that the plot for variety 1 in block I had been 
washed out. An approximate method of analysis is to set this yield 
equal to y, run up the analysis of variance in terms of y and the other 
yields, and estimate y by minimizing SSE. Then substitute y for the 
missing plot yield, and complete the analysis, decreasing the error degrees 
of freedom by one. 

(a) Show that y = (pT + rB - G)/(r - 1 )(p - 1), where T and B 
are the yields for the treatment and block with the missing plot. 

(b) Estimate y for the oats experiment, and compute the analysis of 
variance. 

(c) Make a complete least-squares analysis of the data, regarding the 
missing plot as nonexistent; in other words, we have only 4 plots for 
variety 1 and 7 plots in block I. Show that you obtain the same value of 
SSE as in (6) but that SST in (6) is too large by (2,768 — 7 y) 2 /56. 

18.13. Prove that <r[(Ti - f)(Bj - B)} = 0 in Sec. 18.3. 

18.14. (a) The student should set up the least-squares equations and 
the analysis-of-variance table for an r X r Latin-square design and indi¬ 
cate the expected values of the mean squares in the analysis of variance. 

( b ) What is the standard error of the difference between any two treat¬ 
ment means? 

(c) Show that the efficiency (neglecting the loss of degrees of freedom) 
of the rows in reducing the error variance in an (r X r) Latin-square 
design as compared with a randomized complete-blocks design with the r 
columns as blocks is given by 

T SSR + (r - l) 2 s| 

1 = —r(r - l)s| ~ 

where SSR is the sum of squares for rows and sf, is the error mean square 
for the Latin-square design. Similarly the efficiency of the columns is 
found by replacing SSR by SSC (sum of squares for columns). Hint: 
Use the same method as given in Sec. 18.3 (vi) for the efficiency of the 
randomized complete-blocks design as compared with a completely 
randomized design. 

(d) What is h in this case? What is h'l 

18.15. Given a Latin-square design with r rows, columns, and treat¬ 
ments. Assume that the data for one plot were lost. Show that the 
estimate of the missing value found by minimizing SSE is 

_ r(R + C + T) — 2 G 
. (r - 1)(r - 2 ) ' 
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where R, C, and T are, respectively, the totals of the row, column, and 

treatment which contain the missing value. 

18.16. A Latin-square design was used at the University o awa ^ 
compare 6 different legume intercycle crops for pineapples. The yields 
in 10-gram units are given in the accompanying table. 

Data of Exercise 18.16 

Row totals 


Col. totals 984 


B 

F 

D 

A 

E 

"c : 

1,010 

220 

98 

149 

92 

282 

169 

A 

E 

B 

c 

F 

D 

934 

74 

238 

158 

228 

48 

188 

D 

c 

F 

E 

B 

A 

1,034 

118 

279 

118 

278 

176 

65 

E 

B 

A 

D 

c 

F 

1,051 

295 

222 

54 

104 

213 

163 

C 

D 

E 

F 

A 

B 

803 

187 

90 

242 

96 

66 

122 

F 

90 

A 

124 

195 

B 

109 

D 

79 

E 

211 

808 

984 

1,051 

916 

907 

864 

918 

5,640 

... 


(a) Make a complete analysis of this experiment, and state which 

lop-nme vou would recommend for planting. 

(b) Use the Tukey method 5 to determine whether or not there are 

significant differences among the three top treatments. _ 

(c) Was the Latin-square design useful in this case (see Exercise 18.14). 

(d) Set up another field plan for this experiment. _ 

1817 Professor John Wishart furnishes us this exercise. 

SS marks the plots which received the same total, dre ^^ reeeived 
monthly dressings from November to April. Plots which leceive 

i», „ ayanaroide and “ “ in cLtat tta 

the numbers denoting the yields of the plots m pounu „ 

2S2U* "rS-.t:” 1 » you can 

applying a nitrogenous dressing, of whatever kind, also yo 
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d « termine ’ by a statist >cal test, whether some forms of dressing are more 
effective than others. Set out a full summary table with your conclusions. 

o ss 0 C s 

72.2 55.4 36.6 67.9 73.0 

0 C S3 S D 

36.4 46.9 46.8 54.9 68.5 

SS S D 0 C 

71.5 55.6 71.6 67.5 78.4 

8 0 C D SS 

68.9 53.2 69.8 79.6 77.2 

c D S SS 0 

S2.0 81.0 76.0 87.9 70.9 

The sum of squares of the 25 numbers in the table is 113,574.73.” 
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CHAPTER 19 

THE ANALYSIS OF INCOMPLETE-BLOCKS DESIGNS 

19.1. Introduction. In this chapter, we shall discuss the use of designs 
in which the number of plots per block is less than the number of treat¬ 
ments. As before, we assume p treatments, but with only k plots per 
block (k < p). Assume further that each treatment is applied to r plots 
(each treatment is replicated r times) and that there are a total of q blocks 
(q > r). Hence there are a total of N — kq = rp plots in the experiment. 
The fth treatment appears in the jth block ny times (n,-y = 0 or 1). Hence 

X n * = r > and 1 nii = k ■ This design is called an incomplete-blocks 

3 *’ 

design. 

It might be noted that there are two important situations for which 
these incomplete-blocks designs are used: 

(a) The number of possible plots per block is less than p. Examples of 

this are: 

(i) Nutrition studies on animals where the block is a litter, the 
individual animal is the plot, and the smallest litter size is 
smaller than the number of rations to be studied. 

(ii) Chemical experiments where the block is a day, an individual 
analysis is the plot, and the number of possible analyses in a 
day is less than the number of treatments. 

(iii) Tasting experiments where the block is a single trial by an 
individual taster, the score on a given product is the plot yield, 
and the number of different products which a taster can 
differentiate on a single trial is less than the number of prod¬ 
ucts being tasted. 

(iv) Education or psychology tests in which the block is the trials 
by a single child on a given day, the plot is a single test on this 
day, and the number of tests being considered is greater than 
the number the child can take on a given day without tiring. 

(v) An engineering experiment in which the block is an oven and 
the number of treatments is greater than the number of items 
which can be heated at a time. 
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(b) The number of treatments is so large that enough homogeneous plots 
cannot be found for a complete block . 

(i) If we have 81 varieties of corn to test, it is often impossible to 
find 81 more or less homogeneous plots to form a complete 
block. 

(ii) For greenhouse experiments, temperature and light conditions 
vary so much that it is desirable to form blocks of rather small 
size, the size often being smaller than needed to include all of 
the pots for a complete block. 

We shall not include many examples of these designs in this book, because 
of a limitation of space. Students interested in more examples should 
consult references 1 to 3. 

19.2. General Least-squares Equations. The experimental model 
for this design is 

Yu = nij{n + Ti + ft + Cif), i = 1, 2, . . . , p; j = 1, 2, . . . , q, 
where = 0 or 1 and ^ r * = ^ ft = 0. 

* 3 

The least-squares equations for this model are 

m: Nm = G, 

p 

ft: km + 7 nijti + feft = B h 

i = i 

U: rm + rL + V nij bj = T im 

3 = 1 

The reader will note that the block and treatment effects are mixed up in 
these equations; this is called confounding of the treatment and block 
effects. 

The first step in solving for the treatment effects is to adjust the treat¬ 
ment effects for the block effects by removing the block constants from 
the treatment equations. This is accomplished by multiplying the ft- 
equation by n {j /k and subtracting the sum of all of these altered ft equa¬ 
tions from the L equation. The resultant equation is 



where Ai is called an adjusted treatment total (adjusted for block effects) 
and ^ Ai = 0. The adjusting factor ^ n^-Bj is often written as T bi , the 

i . 3 ' 

total yield of all blocks containing treatment i. Suppose the ith treat- 
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ment appears \u times in the same block with the lih treatment. Then 


Hence we obtain 


i 


nijfiij = 


r 


l 9^ i. 

I = i. 


[r(k ~~ 1 )U — ^ 'KM/k — Ai f i y l — 1, 2 , . . . , p. 

1 7* % 

The values of the h can be solved only by the simultaneous solution of 

the p equations in A h with the linear restraint, V U = 0- One method of 

i 

solving these p equations is to set U = t r / + t p , so that t p — 0 and 
t p = - Hence 

Ai = [r(k - 1X" - £ Mi'lA, i = 1, 2, . . . , p - 1, 

It ^-% 


since ^ = r ( k ~ *)• We see that 

V-l 

-1 


<;a 1} 


where the 4 are the elements of the inverse matrix of the coefficients of 
t" in the (p — 1) equations in A*. 

In this case 

p -i v p 

SST(adj) = V t’-Ai = y (ti - t p )Ai = £ nA h 

1 = 1 i = l 


since 2 Ai = 0. It can be shown that 

E[MST(adj)] = <r 2 + e{ri}/(p - 1), 

where 6 is 0 when {r* = Oj. Also, 

<r 9 it") = 4<r 2 and - t' f ) = (4 - 2c" + c"V 2 , 

where <r 2 == s 2 = SSE/(n — p — q + !)• 

If only a test of the null hypothesis {n = 0} is desired, the forward 
solution of the abbreviated Doolittle method is sufficient, where the A 
values replace the G values in the computations of Sec. 15.3 and the coeffi¬ 
cients of t'/ and t[ f are the elements of the original matrix. An example 
of this computing procedure will be given in Chap. 20 (Example 20.4). 
No detailed computing techniques are given in the present chapter, 
because almost all incomplete-blocks designs which are used have certain 
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restrictions to make the analysis simpler. Some of these designs will be 
discussed in succeeding sections of this chapter. 

19.3. Balanced Incomplete-blocks Designs. If every treatment 
appears with every other treatment in the same block an equal number of 
times, say X, the incomplete-blocks design is said to be balanced (all 
Xii = X). In this case the adjusted treatment equation becomes 

r(k - 1 )ti - X V ti 

_ (%___ = r(k - 1) + X , _ 

k k U ~ Ai > 

P 

because Y U = 0, so that Y h = 

i = i ffi 

If we set + * = rE f 

k f ’ 

U = AilrE f . 

E f is called the efficiency factor for the incomplete-blocks design, since 
the number of effective replications is rE f , instead of r, where E f < 1. 
We see that the number of blocks in this case is 

q = \C(p,2)/C(Jc,2) = \p(p - 1 )/k(k - 1), 

where C(a,b) is the number of combinations of a things taken 6 at a time 
Hence \p(p - 1) = qk(k - 1) = rp(k - 1), and 

E* = Kfc ~~ 1) + X _ Np _ p(k — 1) 
rk rk ~ k(p - 1)’ 

We might say that a necessary condition for “balance” is that 

\p(p - 1) = qk(k - 1) = rp(k - 1). 

However, it may not be possible to construct a balanced design even 
though this condition is met. 

The adjusted mean yield for the fth treatment is 
t'i = U + m = Ai/rE f + G/rp = [k(p - 1 + (k - 1 )G]/rp(k - 1). 

In Sec. 15.3 (page 199), it was indicated that the reduction due to any 
fixed variate, X i} adjusted for all previous fixed variates is given by A iy B iy , 
where A iy = Sy(xi adjusted for all previous fixed variates in the matrix) 
and B iy was the regression coefficient for X*-. In the balanced incomplete- 
blocks design, the regression coefficient U = Ai/rE f , where A z - is the sum 
of cross products, adjusted for block effects. Hence the treatment sum 
of squares adjusted for block effects is 

SST(adj) = ^ Ai/rE f . 

* 
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A more formal proof would be the following: 

The total reduction in sum of squares due to blocks and treatments is 


SSR = £ bjBj + 


UTi, 


If the treatment effects are omitted from the model, 

SSR' (omitting treatments) = ^ (Bj — B) 2 /k = ^ Bj(Bj — B)/k. 

J 3 

Hence the sum of squares due to treatments is 
SST(adj) = SSR - SSR' 

- J (b s -f+D + ? * (^ + \ l ^)/rE, 

= Yi Af/rE/ + X + \ r* ' & ]’ 

where B = G/q, From the G and B j least-squares equations (Sec. 19.2), 
weseethat&,- + (l/fc)^n < ,-« j = (R > -B)/fc. HenceSST(adj) =^A?/rE f . 

Also, SSE = Sy 2 - SSR with (N - p - q + 1) degrees of freedom. 

It might be noted that 


B s — B = kfij + X n v T i + [X n v € v ~~ " XX n b e b‘l’ 

i L i q i 3 J 

Ai — rEfti -f- 


G = rpM + 




It can be shown thatf 

EKBj - By\ = (fcft + 2 «,- 3 t<) 2 + fc(3 - iVVs, 

R(A?) = (r£f/)V? + r(k - l)<r 2 /fc, 

#((7 2 ) = (rpnY + rpa\ 
a(AiAi) = — \<r 2 /k, i ^ l, 
a[(Bj - B)Ai] = 0 = a[(Bj - B)G] - a(A0). 


| These expectations are based on the assumption that the blocks are fixed. If 
the block effects are assumed to be random, more efficient estimates of the treatment 
means can be obtained by utilizing the so-called recovery of interblock information . 
This estimation procedure was first introduced by Yates 4 and will be discussed in 
Sec. 24.1. 








It is easy to see that the one-tailed F test is appropriate to test 

H<>: j n = 0} 


with s 2 estimated by the error sum of squares divided by (IV — p — g-)-l). 
^{U) = (k — 1 )<r*/rkE}. 

<r 2 (0 = + ff-/rp. 

«KU - ti) = a 2 («' - t[) = [2 r(k - 1) + 2X]<r ! / k(rE,Y = 2„yrE f . 

The problem of estimating the efficiency of a balanced incomplete- 
blocks design as compared with other designs will be discussed in Chap. 
24, except for a special type of balanced design which is discussed below. 

Example 19.1. A special type of a balanced incomplete-blocks 
design is one with p = fc® treatments, called a balanced lattice design.% 
This design can be set up in r complete replications of k blocks each (with 
k treatments per block); hence, q = Wc, X = r(k - \)/{p - 1) = r /(k + 1) 
and Ef = k/(k + 1). A balanced lattice design with p = 9(7; = 3) and 
r = 4 was used to test 9 rations fed to rats. Hence X = 1, and E f = f. 
The gains for this experiment were (the ration numbers are in parentheses): 


Replication I 

j g 

1 (1) 20 (4) 15 (7) 11 46 

2 (3) 8 (6) 18 (9) 26 52 

3 (2) 18 (5) 16 (8) 2 _36 

Total 134 

Replication III 

j g. 

7 (1) 13 (9) 19 (5) 14 46 

8 (8) 14 (4) 34 (3) 2 50 

9 (6) 14 (2) 20 (7) 14 _4S 

Total 144 

| A more complicated lattice with p — 


Replication II 

i B } - 

4 (7) 8 (8) 12 (9) 16 36 

5 (1) 20 (2) 2 (3) 2 24 

6 (4) 20 (5) 6 (6) 2 28 

Total 88 

Replication IV 

j g . 

10 (5) 19 (7) 23 (3) 6 48 

11 (0 22 (6) 12 (8) 2 36 

12 (9) 27 (2) 7 (4) 20 J54 

Total 13 8 

treatments is called a cubic lattice. 
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Ration 

Total gain = Ti 

12 

V riijBj — Tbi 

A 

£ 

1 

£ 

CO 

II 

CO 

t[ = (12 A, + G)/36 

] 

2 

3 

. 4 

5 

6 

7 

8 

9 

75 

47 

18 

89 ' 

55 

46 

56 

30 

88 

152 

162 

174 

178 

158 

164 

178 

158 

188 

73 

-21 

-120 

89 

7 

-26 

-10 

-68 

76 

22.11 

11.67 

.67 

23.89 

14.78 

11.11 

12.89 

6.44 

22.44 

Total 

504 = G 

1 ,512 = 30 

0 

126.00 


In this experiment, a constant for replications can be inserted in the 
model and a sum of squares for replications removed from the block 
of squares. The total sum of squares for blocks is 


+ (36) 2 + (54) 


(504) 2 
36 


SSB . 

= 7,402.67 - 7,056 = 346.67. 

The sum of squares for replications is 

(134) 2 + • ' ~ + (138) 2 _ 7 Q56 = 219.56. 

9 

Hence the sum of squares for blocks in replications is 
346.67 - 219.56 = 127.11. 

The sum of squares for treatments (adjusted for blocks) is 

acm ,.s (73) 2 + (-21) 2 + ^_L±_gg) 2 = 1,456.15. 

SST(adj) --(9) (3) 


The error sum of squares is 

Sy> - SSB - SST(adj) = 9,316 - 7,056 - 346.67 - 1,456.15 = 457.18 
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Hence the analysis-of-variance table is: 


Source of variation 

Degrees of 
freedom 

Sum of 
squares 

Mean 

square 

E(MS) 

Replications 

3 

219.56 

73 19 


Blocks (in replications) 

8 

' 127.11 

15.89 


Treatments (adj) 

Error 

8 

1,456.15 

182.02 ; 

cr 2 + 3 9(t) 


16 

457.18 

28.57 

a 2 

Total 

35 

2,260 



To test for treatment differences, F = 182.02/28.52 = 6.37 with 8 and 
16 degrees of freedom, for which « < .001. The standard error of the 
difference between any two adjusted treatment means (£. - t[) j s 


In this case we can compute the efficiency of this design relative to a 
randomized complete-blocks design with 4 replications (complete blocks). 
First we compute SST (unadjusted) and then the estimated randomized- 
blocks error as the difference Sy* - SST - SS (replications). The 


Source of variation 

Degrees of 
freedom 

Sum of 
squares 

Mean 

square 

Replications 

3 

219.56 


Treatments 

8 

1,194.00 


Randomized blocks error 

24 

846.44 

35.27 

Total 

35 

2,260.00 



effective incomplete-blocks error is 28.57 /E f = 38.09. Hence the effi¬ 
ciency of the incomplete-blocks design is only 35.27/38.09 = .93. In 
addition there is another loss due to fewer error degrees of freedom (see 
reference 1, page 28). 

19.4. The Simple Lattice Designs. One special type of a nonbalanced 
incomplete-blocks design is a lattice design with fewer replications than 
are needed for balance. A lattice design with two replications is called a 
simple lattice , one with three replications a triple lattice , etc. It was 
shown in Example 19.1 that a lattice is balanced if the number of replica¬ 
tions r = k + 1. Hence, if r < k + 1 , the lattice is not balanced. The 
analysis of a nonbalanced lattice is much simpler than the general analysis 
presented in Sec. 19.2, because of a special method of allocating the treat- 
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ments to the blocks. The k 2 treatments are assigned at random one of 
the following treatment numbers: 

11 12 — * 1 k 
21 22 • • • 2k 


kl k2 • • • kk 

If a simple lattice is used (r = 2), the row combinations (11,12, . . . ,lk; 
21,22, . . . ,2k; etc.) are assigned to separate blocks in one replication, 
X, and the column combinations (11,21, . . . ,fcl;12,22, . . . ,k2; t etc.) 
assigned to separate blocks in the second replication, Y. If a triple 
lattice is used, an additional replication, Z, is taken from the diagonals 
(11,22, . . . ,kk; 21,32, . . . ,1k; etc.). For a complete discussion of 
this design, see references 2 and 7. 

Let us consider the analysis of a simple lattice'bxperiment (r = 2). We 
shall designate the treatment effects as j = 1,2, . . . ,k) and simi¬ 
larly for treatment totals (7#) and adjusted treatment totals (A#). Let 



3 


The yield of the (i,j) treatment in the X replication is designated as X {j 
and in the Y replication as Y#. Therefore Ta = X^ + FV And finally 
we shall use the notation 

2>- = x.„ X Ytf = Xi -’ = 

% 3 i 3 

and similarly for F. Hence 

= Tn - l (Xi. + Y.j). 


Also we note that X will be 1 for those treatments with the same row or 
column subscript and 0 elsewhere. Hence 

Aij = j^2(/s 1 ) Uj ^ Uy ^ t'i'j J / ^ 


Since 


= 2 Uj - (a,- + dj)/k. 
^ A„ = Y<. - Y. ,/fe = a,-, 

j 

^ a i = kj = 0. 

i j i i 


^ Ay = X.} - X. ,/k = dj, 

i 
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We have shown that the sum of squares for treatments (adjusted for 
blocks) is 

SST(adj) = Aijtij = \ Aij[Aij + (a* + dj)/k] 

i 3 i 3 

= i {[Tii ~ {Xi - + Y '> )/h] 

• [T iS - (Xi. + Y - Y t . - X.j)/k]}. 
By expanding these products term by term, we have 

SST(adj) = i T?j - (X.. - F..) 2 A 2 

- [X n + X A + 2 [X n +1 ^]/d’ 

% j i j 

where T,. = X 4 . + 7<. and T.j - X. s + Y+ 

The error sum of squares is obtained by computing the unadjusted 
block sum of squares, SSB, and then subtracting to obtain 

SSE = Sy 2 - SSB - SST(adj). 

The analysis of variance is: 


Source of variation 

Degrees of 
freedom 

Sum of 
squares 

Mean 

square 

Blocks 

2k - 1 

SSB 


Treatments (adj) 

k 2 - 1 

SST(adj) 

M ST (adj) 

Error 

(k - l) 2 

SSE 

s 2 


The blocks sum of squares can be divided into two parts, for replications 
(1 degree of freedom) and blocks in replications [2 (k — 1) degrees of 
freedom]. The replication part is independent of treatment effects and 
equals (X.. — F. ,) 2 /2k 2 . The blocks-in-replication part is a mixture of 
block and treatment effects as well as error and is simply SSB — 
(X.. - Y. ,) 2 /2k 2 .f 

In this case it is not possible to give a single figure for the variance of 
the difference between two adjusted means, since some treatments will 
appear together in the same block and others never appear together in 
the same block. It seems reasonable that the variance of the difference 
will be lower for those treatments appearing together in the same block. 
Let us first consider two adjusted treatment means ty and t f 2j , which 
appear in the same block. 

f Most analyses of lattice designs now make use of the recovery of interblock infor¬ 
mation, mentioned in the footnote on page 253. See references 1 and 8 for some exam¬ 
ples of the theory and computational procedures. 
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H, » MV, + U, - 
t[, - 4 , = M-V + Y u - 


(X». 

x v 

V ' 


n - X ., + Y.,)/h], 

Y v - (I,. - X* - Fi. + Fa.)/ft] 


X «) - (l) l. 

N ' j'9^3 


(Xif — X 2 #0 


+ (4© (F « - F * }+ © (Fv ■ F2/) _’ 


a \e u - 4,) = (2/4 fc 2 )[(/c - l) 2 + (fc - 1) + (ft + l) 8 + (fc - l)k 2 

= (ft + 1 w/k. 


Now consider two treatments which do not appear together in the same 
block, such as ( u and hi- In this case the X. t and Y ., do not cancel out, 
as above, and we have 

cr 2 (Yi 1 - 4 ) = (k + 2 )<r 2 /k. 

We note that there are k 2 (k 2 - l)/2 possible treatment comparisons, 
of which k 2 (k - 1) are between treatments in the same block and 
^Jc 2 (k — l) 2 between treatments not in the same block. Hence we might 
use as an average variance of the difference between adjusted treatment 
effects, 

\k 2 (k _ i)(fc + 1)//, + k 2 (k - l ) 2 (k + 2)/2k]a 2 = (fc + 3> 2 
. ~WQc 2 - l)/2 .. k + 1 

The factor (k -f 1 )/{k + 3) is the efficiency factor, E h for this design. 

Example 19.2. To illustrate the computing techniques for the simple 
lattice, we shall use the first two replications of the (3 X 3) experiment 
considered in Example 19.1. We shall designate the treatments as 11, 
12, 13, , 33 instead of 1, 2, . . . , 9, with X for Replication II and 

Y for Replication I. The treatment and X and Y totals are (treatment 
numbers in parentheses): 








Ti. 



Yi. 

Xi. 

Y.j 

(11) 

40 

(12) 

20 

(13) 

10 

70 

l" 

48 

46 

24 

' 46 

(21) 

35 

(22) 

22 

(23) 

20 

77 

2 

20 

49 

28 

36 

(31) 

19 

(32) 

14 

(33) 

42 

75 

3 

20 

39 

36 

52 


94 


56 


72 

222 

Total 

88 

134 

88 

~134 

II 

: 3,706 

(222) 2 

18 

= 968, 








SST(adj) = \ |(40) 2 + • • • + (42) ; 

(70) 2 + (77) 2 + • • • + (72) 2 , 2[(48 ) 2 + (20) 2 + • • • + (39) 2 ] \ 
..« ..' + ' 3 ”. I 


(88 - 134) 2 
9 


SSB 


= £[6,530 - 235.11 
_ ( 24) 2 + (28) 2 + 


11,203.33 + 6,094.67] = 593.11, 
• + (52) 2 _ (222)? _ m 


18 
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The analysis of variance is: 


Source of variation 

Degrees of 
freedom 

Sum of 
squares 

Mean 

square 

Replications 

1 

118 


Blocks (in replications) 

4 

68 


Treatments (adj) 

8 

593 

74 

Error 

4 

189 

47.2 

Total 

17 

968 


Treatments (unadjusted) 

8 

527 


Randomized blocks error 

8 

323 

40.4 


The estimated average stan dard e rror of the difference between two 
adjusted treatment effects is \/6* 2 /4 = 8.4. t f i} - = [kT i3 - d i3 ]/2k, where 
da — (Xi. — Yi. — X.j + Y.j). The 8 i3 and tk are presented below. 


&a 

1 

2 

3 


4 

1 

2 

3 


1 

-24 

-6 

10 

-20 

1 

16.0 

9.0 

6.7 

31.7 

2 

-23 

-5 

11 

-17 

2 

13.7 

10.2 

11.8 

35.7 

3 

-5 

13 

29 

37 

3 

8.7 

9.2 

25.8 

43.7 


-52 

2 

50 

0 


38.4 

2874 

44.3 

111.1 


The efficiency of this design as compared with randomized complete 
blocks is only 40.4 4 3(47.2)/2 = 0.57, plus the fact that we have only 4 
instead of 8 error degrees of freedom. This result has no practical use 
because no experiment should be planned if it has only 4 degrees of free¬ 
dom for the estimate of the error variance. 

If the simple lattice is duplicated several times so that 2d replications 
(2kd blocks) are used, the analysis is changed in the following respects: 

(i) There are now (2 d — 1) degrees of freedom for replications and 
2 d(k - 1) degrees of freedom for blocks in replications. The differences 
between block totals with the same treatments in the blocks are free of 
treatment effects and hence indicate real block differences. The block 
totals can be analyzed as follows for the X group (and similarly for the 
Y group): 

Degrees of freedom 
Replications d — 1 

X groups k — 1 

Block effect (d — l)(k — 1) 

Hence there are 2 (d - 1 )(fc - 1) degrees of freedom for real block differ¬ 
ences and 2 (k — 1) degrees of freedom with block and treatment effects 
mixed up. 

(ii) Each X i3 and Y {j is now a total of d yields and T i3 a total of 2d 
yields. Hence all sums of squares must be divided by 2d instead of 2 
(similarly for means). Also, X will now be d instead of 1. 
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(iii) The above variances of the differences between two adjusted 
treatment effects are divided by d . 

(iv) There are now (k - 1)(2 dk - k - 1) degrees of freedom for 
error. 

19.5. Other Lattice Designs. Yates 2 presents the theory for triple 
lattices. We shall not include the details in this book. The essential 
change is that each treatment is given a third subscript as was done for 
the Latin-square design, and a Z replication is introduced. This design 
is also presented in references 1 and 8. If more than three replications 
are used, in almost all cases either the simple or triple lattice is duplicated 
or a balanced design is used. 

It should be indicated that it is also possible to set up a lattice experi¬ 
ment in a Latin-square design, called a lattice-square design. The com¬ 
puting details are much more complicated than for the randomized blocks 
designs, but the reduction in error variance is often quite large especially 
for very heterogeneous experimental material. Details of these designs 
are also presented in reference 1. 

Harshbarger 9,10 introduced a new type of lattice design, called the 
rectangular lattice , in which the number of treatments is k(k + 1). This 
design enables the experimenter to utilize the simplicity of the lattice 
without restricting his experiments to exactly k 2 treatments. Cochran 
and Cox 1 present simplified computing techniques for this type of design 
also. A computing manual has been prepared by Robinson and Watson. 11 

Kempthorne and Federer 12 present the general theory of prime-power 
lattices. 

19.6. Methods of Constructing Incomplete-blocks Designs. Refer¬ 
ences 3 and 13 present methods of constructing balanced incomplete- 
blocks designs. The construction of the lattice designs is quite simple, 
since the experimenter needs only to randomize the treatment numbers, 
then the blocks within a replication and the treatments in each block. 
We shall present one example of a balanced design which is not a lattice. 

Example 19.3. Consider the design for the 7 treatments in blocks of 
3 in Exercise 19.3. There are C(7,3) = 35 combinations of these 7 treat¬ 
ments in groups of 3, but if we restrict ourselves to enough combinations 
so that each treatment appears once and only once with every other 
treatment (X = 1), we can manage with 7 blocks. The method of con¬ 
struction is one of gradual elimination. Obviously there are many ways 
of selecting the set of 7 from the set of 35 combinations. One such set of 
7 is 

ABC, ADE, AFG, BDF, BEG, CDG, CEF. 

Another was used in Exercise 19.3. Probably the best procedure to use 
is to select some basic rule, such as the one given in reference 1. The 
following set 
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ABD, BCE, CDF, DEG, EFA, FGB, GAC 

was obtained by cyclic substitution (the first letter is one removed from 
the second and the third two removed from the second, where F -f- 2 = A, 
for example). Then randomly assign the 7 treatments to the 7 letters! 

19.7. Summary. In Sec. 19.1, we indicated certain experimental 
conditions which might make the use of an incomplete-blocks design 
desirable. For illustrative purposes, some data from an experiment with 
nine treatments have been analyzed. Unless the blocks are so small that 
nine experimental units cannot be placed in each block, it is seldom advis¬ 
able to use the incomplete-blocks design for so few treatments, unless 
enough replications can be used to give at least 20 degrees of freedom for 
the error mean square. Even then there is seldom enough gain in effi¬ 
ciency to warrant the extra computing time. The experimenter needs to 
know something about the block-to-block variability of his experimental 
material before he can decide whether or not he should use an incomplete 
blocks design. We believe that it is advisable to build up a body of 
evidence from continuing experiments in order to help with this decision. 
Many engineering experiments and experiments in the physical and social 
sciences may make use of the incomplete-blocks designs, because the size 
of the blocks is often limited. However, little information is available 
on this point to date (see reference 14 for one example in the field of 
chemistry). 

Since the analyses of most incomplete-blocks data use the recovery of 
interblock information, more will be said on these points in Chap. 24. 
However, it should be warned here that it seems inadvisable to use this 
method of analysis unless there are at least 15 degrees of freedom for the 
block mean square (and preferably at least 25 degrees of freedom). 

EXERCISES 

19.1. A simple example of a balanced incomplete-blocks experiment is 
the following balanced lattice with four treatments, six blocks, and two 
treatments per block: 



Blocks 


Treatments i 


1 
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(a) Set up the analysis of variance and adjusted treatment means, for 

this experiment, and test for treatment differences. 

(i b ) Estimate the standard error of the difference between any two 
adjusted treatment means. 

(c) Show how the replication constants fit into the model, and deter¬ 
mine the expected value of the replication mean square in the analysis of 
variance. 

(d) Compute an estimate of the efficiency of this design relative to a 
randomized complete-blocks design. 

19.2. Dr. Pauline Paul 5 conducted an experiment to compare the effects 
of cold storage on the tenderness and flavor of beef roasts. Six periods of 
storage (0, 1, 2, 4, 9, and 18 days) were tested (p = 6). Since the same 
cut of meat on each side of an animal was expected to be similar but differ¬ 
ent cuts on the same side dissimilar, it was decided to use a balanced 
incomplete-blocks design with k — 2, X = 1, q = 16, and r = 5. In this 
case it was also possible to arrange the cuts in complete replications of 3 
cuts from each side. The scores for tenderness of beef were (periods of 
storage in parentheses): 



Replication I 



Replication II 




Replication III 

Bi 


B f 




Bi 





(0) 

7 (1) 17 

24 


(0) 17 (2) 

27 

44 


(0) 

10 (4) 

25 

35 

(2) 

26 (4) 25 

51 


(1) 23 (9) 

27 

50 


(1) 

26 (18) 

37 

63 

50 

(9) 

33 (18) 29 

62 


(4) 29 (18) 

30 

59 


(2) 

24 (9) 

26 


137 




153 





148 



Repli 

cation IV 


Repl: 

ication V 












Bi 




(0) 

25 

(9) 

40 65 

(0) 

11 

(18) 

27 

38 




(1) 

(2) 

25 

(4) 

34 59 

(1) 

24 

(2) 

21 

45 




34 

(18) 

32 66 

(4) 

26 

(9) 

32 

58 







190 





141 




The blocks run across the page. 

(a) Make a complete analysis of these data, showing the adjusted 
treatment means, the analysis of variance, the standard error of treatment 
differences, and general conclusions. 

(b) Show how to determine the number of paired cuts of beef (blocks) 
needed for a balanced experiment in this case. 

(c) Suppose it had been possible to pair the cuts into groups of four 
like ones, instead of two. What design would you then set up?. 

(d) Is there any method of determining the trend of tenderness m terms 

of storage time? 

(e) Was there any appreciable gain in using the incomplete-blocks 
design for this problem? 
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19.3. Moore and Bliss 6 set up a balanced incomplete-blocks design to 
compare the toxicity of each of 7 chemicals on Aphis rumicis. The meas- 
ure of the toxicity was the logarithm of the dose (+3.806) required for a 
95 per cent kill. Since only 3 chemicals could be tested on a given day, 
k = 3. Seven days were required to make a balanced design. The 
toxicities were as follows: 


Chemical 


A 

B 

C 

D 

E 

F 

G 

Bi 


Hay 

Ti 

1 

2 

3 

4 

5 

6 

7 

.465 

.602 


.443 




1.510 

.343 




.652 

.536 


1.531 


.873 

.875 


1.142 



2.890 

.396 


.325 




.609 

1.330 


.634 




.409 

.417 

1.460 




.987 

.989 


.931 

2.907 



.330 

.426 


.309 


1.065 

1.204 

2.109 

1.530 

1.856 

2.783 

1.254 

1.957 

12.693 


(a) Can you separate out replications in the analysis of these 
data? 

(b) ' Show how to determine the number of days required for a balanced 
experiment. 

(c) Make a complete analysis of these data. 

(d) C and F were basically the same chemical compound. Were they 
different from one another? How did they compare with the standard 
treatment, A? 

(e) How did A compare with all the others, excepting C and F? 

19.4. The data in the accompanying table represent the yields (in 

bushels per acre, minus 30 bushels) of a 5 X 5 simple lattice experiment 
on soybean varieties with 4 replications. 1 The variety numbers are given 
in parentheses. 

(n) First analyze only the first two replications of this experiment, 
indicating whether or not there are significant varietal effects, the average 
standard error of adjusted varietal differences, and the efficiency of using 
the lattice design. 

(b) Second analyze the entire experiment, making the necessary adjust¬ 
ments with d = 2. 

(c) Show how this experiment might have been set out in the field. 
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Data of Exercise 19,4 
Replication I 


(1) 

6 

(2) 

7 

(6) 

16 

(7) 

12 

(ID 

17 

(12) 

7 

(36) 

18 

(17) 

16 

(21) 

14 

(22) 

15 


(1) 

24 

(6) 

13 

(2) 

21 

(7) 

11 

(3) 

16 

(8) 

4 

(4) 

17 

(9) 

10 

(5) 

15 

(10) 

15 


(1) 

13 

(2) 

26 

(6) 

15 

(7) 

18 

(11) 

19 

(12) 

10 

(16) 

21 

(17) 

16 

(21) 

15 

(22) 

12 


(1) 

16 

(6) 

7 

(2) 

15 

(7) 

10 

(3) 

7 

(8) 

11 

(4) 

19 

(9) 

14 

(5) 

17 

(10) 

18 


(3) 

5 

(4) 

8 

(8) 

12 

(9) 

13 

(13) 

7 

(14) 

9 

(18) 

13 

(19) 

13 

(23) 

11 

(24) 

14 


Replication II 


(11) 

24 

(16) 

11 

(12) 

14 

(17) 

11 

(13) 

12 

(18) 

12 

(14) 

30 

(19) 

9 

(15) 

22 

(20) 

16 

Replication III 

(3) 

9 

(4) 

13 

(8) 

22 

(9) 

11 

(13) 

10 

(14) 

10 

(18) 

17 

(19) 

4 

(23) 

13 

(24) 

20 

Replication IV 

(11) 

20 

(16) 

13 

(12) 

11 

(17) 

7 

(13) 

15 

(18) 

15 

(14) 

20 

(19) 

6 

(15) 

20 

(20) 

15 


Block Totals 
(5) 6 32 

(10) 8 61 

(15) 14 54 

(20) 14 74 

(25) 14 _68 

289 

(21) 8 80 

(22) 23 80 

(23) 12 56 

(24) 23 89 

(25) 19 _87 

392 

(5) 11 72 

(10) 15 81 

(15) 16 65 

(20) 17 75 

(25) 8 _68 

361 

(21) 21 77 

(22) 14 57 

(23) 16 64 

(24) 16 75 

(25) 14 84 

357 
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CHAPTER 20 

FACTORIAL EXPERIMENTS 

20.1. Introduction. The experimenter often wishes to test various 
types of treatments, each with several different representatives. For 
example, he might wish to compare two varieties of corn (»i and v 2 ) and 
two different fertilizers (fi and f % ) in a single experiment, giving a total o 
four treatment combinations ; this is called a 2 X , or 

2 2 factorial experiment. These treatments could be tested m any of the 
field designs presented in Chaps. 18 and 19. The methods of analysis 
presented in those chapters would apply to this problem if all four treat¬ 
ment combinations were considered to be four unrelated treatmen s; 
however, if there are treatment differences, the factorial design can be 
used to explain these differences in a more definite manner. Using the 
factorial model, the effect of any treatment combination is considered to 
be the sum of three effects, varietal effect, fertilizer effect, and -interaction 
of the variety and fertilizer. The interaction measures the failure of the 
various fertilizer effects to be the same for each variety or, conversely, t e 
failure of the various varietal effects to be the same with each fertilizer. 

The interaction is the important effect about which the factorial design 
can give information. Many experimenters still examine the performance 
of one set of treatments, such as different fertilizers, for one standard 
variety and then different varieties for a standard fertilizer. Such an 
experiment tells little about the optimum fertilizer-variety combination 
which should be used, if the fertilizers do not respond m a similar manner 
for all varieties. Or if an engineer wants to know something about the 
relationship between the temperature of a process and the length of tune 
the process is carried on, he needs to try out various combinations of the 
two variables—temperature and time. Similarly an animal feeder may 
want to know the optimum level of supplemental feeding and typeof 
pasture or the optimum combination of concentrates and roughage in t e 
ration And the human nutritionist needs to know the best combination 
of various parts of the diet for healthy living. All of these experiments 
require some knowledge of how different amounts or kinds of one treat¬ 
ment interact with different amounts or kinds of another treatment, if 
the results are purely additive, that is, one treatment acts independently 
of the other treatment, the experiment can be divided into two simple 
experiments on the two treatments. However, the experimenter seldom is 
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sure that there is no interaction and often is afraid that there will be some 
interaction especially if the individual representatives of each treatment 
are widely different. 

. /* Sh ° uld be pointed out that the factorial design can never lose any 
information even when there is no interaction (and if there is interaction, 
this design is indispensable). To illustrate this, suppose we consider the 
two varieties and two fertilizers, each of the four treatment combinations 
being used r times in an experiment. However, if there is no interaction, 
the average difference between the two varieties is obtained from 2r 
individual differences, and similarly for the difference between the two 
fertilizers. Hence m the absence of interaction, we have obtained 2r 
replications for each variety and for each fertilizer. This feature of the 
factorial design, called hidden replication, should not be overemphasized 
ecause it is built upon the thesis of no interaction. But it can be used 
as an argument against those who refuse to use this design, because they 
think they lose information when they have no interaction. The authors 
are firm believers in a wider use of the factorial type of experiment 
because one cannot lose any information on the main effects and one can 
obtain information on something which may be equally important, the 
interactions. The reader is encouraged to read references 1 to 3 for more 
extensive discussions of the analysis and usefulness of various types of 
factorial designs. 

20.2. The Analysis of a p X q Factorial Experiment. Let us assume 
we have two sets of treatments (A and C), with p A treatments and q C 
reatments, with a randomized complete-blocks experimental plan. The 
experimental model can be written as follows (assuming r blocks of va 
treatments per block): 

Yij k = ii + a *• + y j + + (3fc + e ijk 

— m + a,i + Cj + ( ac)ij + b k + e^k, 

where is the added effect of the fth A treatment (i = 12 v )< 

7y is the added effect of the/th (7 treatment (j = 1,2, . . /^) ; (a 7 k is 
the added effect of the interaction of the ith. A treatment and ’jth C treat¬ 
ment; and 0* is the added effect of the kth block (k = 1,2, r )A 

The estimates are a i} c j} (ac) ijf and b h respectively. a 4 and y 3 - are called 

the main effects. The following restrictions are imposed: ^ = 0; 

^ Yj = 0; ^ (ay) ij = ^ (ay) i3 = 0. 


f If a completely randomized design is used, p k is omitted from the model; and for 
a Latm-square design row and column constants replace fo. The use of the incom¬ 
plete-blocks design will be discussed later. 
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It should be emphasized that the effect of a given A treatment is 
assumed to be the same for all C treatments j this assumption is justified 
only if there is no interaction. If there is an interaction, one should 
logically be studying different A effects for each C treatment and, con¬ 
versely, different C effects for each A treatment. Hence, if there is a real 
interaction, the experimenter will want to compare the pq treatment 
combinations. The analysis of most factorial experiments is usually a 

two-stage process: > 

(i) Test the null hypotheses that all main effects and interactions 

vanish. 

(ii) If there is a significant interaction, then determine whether there 
are real A effects for each C treatment and real C effects for each A 
treatment. 

The least-squares equations for this model are 


Ai — qrm + qra t — qrp + qron + € ijk) 

3 k 

Cj — prm + prcj = prp + pry? + e#*, 

i k 

{AC)a = r[m + a { + Cj + ( ac) i3 ] == r\fi + a* + 7 j + (ay)ij\ + ^ e ijk , 

k 

B k = pqm + pqb k = pq(n + @ k ) + Ujk, 

i j 

G — pqrm -- pqrp + 


e ijk, 


i j k 


where Ai is the total yield for the qr plots using the ath A treatment, Cj Tor 
the pr plots using the jth C treatment, (AC)ij for the r plots using theTth 
A treatment and the jth C treatment, B k for the pq plots of the /cth block, 
and G for the total pqr plots. Hence the solutions are 


a ^Ai 

% q r 

(ac)ij = 


m, 

{ACT)* 


Ci = 


m. 


pr 

Ai Cj , 

— j. t m. 
qr pr 


It can be shown that the reductions in sum of squares due to the various 
treatment effects are: 


1 (a, - Ay 

A . CJCJ A _ i . — 

i 

_ 91 

A. ioioA. •— 

qr 

qr 

pqr 

l - cr 

la 

G 2 

C: SSC = A -— - = 

3 

pr 

VOX 
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(AC): SS(-4C) 


U - 7 


Cj + G_ 
P PQ. 


22 

i 3 _ 

r 


where A = G/p and C = G/q. 

The analysis-of-variance table is 


pqr 


- SSA - SSC, 


Source of 
variation 

Degrees of 
freedom 

Sum of 
squares 

Mean 

square 

2? (MS) 

Blocks 

r — 1 

SSB 

MSB 

<r 2 + pq ^ j 31/ (r - 1) 

A treat¬ 
ments 

V ~ 1 

SSA 

MSA 

Q 

+ 

i 

C treat¬ 
ments 

q - 1 

SSC 

MSC 

<r 2 + pr ^ y)/(q - 1) 
i 

(AC) 

(v ~ i)(g - l) 

SS (AC) 

MS (AC) 

* s + r ^ (<xy)ii/ (v - 1 )(q - l) 

Error 

(vq - 1 )(r - 1) 

SSE 

s 2 

i j 

<T 2 


The F ratios, MSA/s 2 , MSC/s 2 , and MS(AC)/s 2 can be used to test for the 
existence of real main effects and interactions. 

If the AC interaction were significantly greater than zero, the experi¬ 
menter probably would want to test the A treatments for each C treat¬ 
ment and the C treatments for each A treatment. For each C treatment 
there would be (p - 1) degrees of freedom to test the A treatments; and 
for each A treatment, there would be (q - 1) degrees of freedom to test 
the C treatments. These two analyses are not orthogonal; hence, the 
significance levels may be somewhat upset, but the experimenter should 
make them, perhaps using a = .01 or lower significance levels. When p 
and q are rather large, one would expect several significant F’s by chance, 
because of the large number of comparisons. 

The extension of these results to more than two sets of treatments is 
simple, the only complication being the greater variety of interactions. 
For example if there are three sets of treatments (A, C, and D), we have 
the following interactions: 


Two-factor: AC, AD, CD. 
Three-factor: ACD. 
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The two-factor interactions are handled as above, while the three-factor 
interaction is handled as the remainder after accounting for the main 
effects and two-factor interactions. A significant three-factor interaction 
is more difficult to interpret' Many experiments are planned with 
r = 1, but with the three-factor and higher-factor interactions^ used 
to estimate the error mean square. We do not have the space to discuss 
these wider uses of the factorial design in this book, but they are discussed 
in detail in references 1 and 3. 

Example 20.1. An experiment using 3 varieties of sugar cane (k) and 
3 levels of nitrogen (N) was conducted in Hawaii in 1942, with 4 replica¬ 
tions. The nitrogen levels were, respectively, 150, 210, and 270 pounds 


per acre. 

The yields 

in tons of cane per acre were: 



Blocks 

FA, 

FiA 2 

V t N, 

WVi 

F 2 A 2 

WV 3 

V 3 N 1 

V zN 2 

VzNz 

Total 

1 

70.5 

67.3 

79.9 

58.6 

64.3 

64.4 

65.8 

64.1 

56.3 

591.2 

2 

67.5 

75.9 

72.8 

65.2 

48.3 

67 ,.3 

68.3 

64.8 

54.7 

584.8 

3 

63.9 

72.2 

64.8 

70.2 

74.0 

78.0 

72.7 

70.9 

66.2 

632.9 

4 

64.2 

60.5 

86.3 

51.8 

63.6 

72.0 

67.6 

58.3 

54.4 

578.7 

Total 

266.1 

275.9 

303.8 

245.8 250.2 

281 .7 

274.4 

258.1 

231.6 

2,387.6 

It is usually best to arrange these results 

in a 3 X 3 table of means as 


follows: 



Vi 

v 2 

E s 

Mean 

N, 

66.52 

61 .45 

68.60 

65.52 

Ab 

68.98 

62.55 

64.52 

65.35 

Nz 

75.95 

70.42 

57.90 

68.09 

Mean 

70.48 

61.81 

63.67 

66.32 


C = (2,387.6) 2 /36 = 158,350.94. 


The analysis of variance is: 


Source of 
variation 

Degrees of 
freedom 

Sum of 
squares 

Mean 

square 

Blocks 

3 

200.68 

66.89 

Varieties 

2 

319.37 

159.69* 

Nitrogen 

2 

56.54 

28.27 

V X N 

4 

559.79 

139.95* 

Error 

24 

1,053.78 

43.91 


* Significant at the a — .05 level. 
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There were significant differences among the average varietal effects, 
but not among the average nitrogen effects. However, a significant 
interaction indicates that the nitrogens have different effects for each 
variety, possibly in opposite directions. In this case, N 3 is best for V 1 
and F 2 , while N 1 is best for F 3 . The standard error for the difference 
between any two of the nine treatment means is \/2 (43.91)/4 = 4 . 68 . For 
Fi, N s - N 1 ( = 9.43) is almost significant and for F 3 , Ni — N 3 (= 10.70) 
nY is significant, at the 5 per cent proba- 

80 _ bility level. These results are displayed 

g> m -Fig- 20.1. These lines show that Fi 

^ 75 - better than F 2 and that nitrogen 

l 70 affects both in the same way, with a 

§ decided jump in yield for the third level, 

c 65 - However, nitrogen had an adverse effect 

S -* on fbe third variety. A closer examina- 

£ tio n of the experiment disclosed that F 3 

55 - 3 with a large amount of nitrogen matured 

, _ , before the other two varieties, but all 

1 ‘ 2 3 varieties were harvested at the same 

Nitrogen Level time. Hence some of the cane for F 3 

Fig. 20.1. Effects of different ni- with N 2 and iV 3 had rotted, resulting in 

trogen treatments on the yield of decreed violds nLu - + 

three varieties of sugar cane decreased yields tor the high nitrogen 

treatments. 

20.3. An Alternative Analysis for a 2 X 2 Factorial Experiment. An 

alternative computing process for the treatment sums of squares makes 
use of the orthogonal linear forms introduced in Chap. 6 . Since there are 
three treatment degrees of freedom, we can set up three orthogonal forms. 
These three orthogonal comparisons plus G are: 


I23 
Nitrogen Level 

Fig. 20.1. Effects of different ni¬ 
trogen treatments on the yield of 
three varieties of sugar cane. 


Original treatment totals 


(AO)n 

(AC) 12 

(AChx 

(AC) 22 

E(L 2 ) 

+ 

+ 

+ 

+ 

16r 2 ju 2 + 4 ?v 2 


— 

+ 

+ 

4r 2 (a 2 — «i) 2 + 4r<r 2 


+ 

— 

+ 

4 r 2( T2 _ 7i ) 2 _j_ 4 rtr 2 

+ 

— 

— 

+ 

r 2 / 2 + 4ro- 2 


Lx — G 
L<i — A 
Lz — C 
Lx = {AC) 


where I - [(<*?)„ - (ay) u ~ (ay) 21 + (<* 7 ) 22 ], The computing pro¬ 
cedure for U is simply to subtract the total yield of the first A treatment 
from that of the second A treatment, and similarly for the other L’s. 
From E(L |) we see that if we set up the null hypothesis H 0 : ai = a 2 
E(Lt) = 4r<r 2 , and if we divide E(L\) by 4r, we have an unbiased estimate 
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of <t 2 . Hence, to test H 0 , we need only compute the ratio 

F - JaL, 

r 4 rs 2 

with 1 and 3 (r - 1) degrees of freedom. And we see that if ai ^ the 
expected value of the numerator of F is greater than that of the denomi¬ 
nator; hence, the one-tailed F should be used. To test H 0 : 71 = y 2 , we 
use F = Ll/irs 2 . It is easily seen that if we use F = L\/^rs 2 , we are 
testing whether [( 0 : 7)11 —* (057)12] is different from [( 0 : 7)21 “ ( 0 : 7 ) 22 ]. In 
other words, is the difference between the two C treatments the same 
using the first A treatment as using the second? Or, conversely, we can 
use [( 0 : 7)11 — ( 0 : 7 ) 21 ] versus [( 0 : 7)12 — ( 0 : 7 ) 22 ]. Since this is exactly what 
we wanted to test by our interaction effect, it seems reasonable to use L 4 . 

Example 20.2. An experiment was run to determine the effectiveness 
of four different fertilizers on the total yield of oranges over a 12-year 
period (1928 to 1939), at the Citrus Experiment Station, Riverside, Calif. 
The four fertilizers were nitrogen (. N ), nitrogen + phosphate {NP), 
nitrogen + potash {NK), and nitrogen + phosphate + potash (NPK). 
Four blocks were used, giving a total of 16 plots. The yields in pounds of 
oranges per tree per plot were as follows: 


Treatment 

Block 

Total 

Average 

1 

2 

3 

4 

N 

1,290 

1,531 

1,469 

1,631 

5,921 

1,480 

NP 

1,085 

1,348 

1,555 

1,328 

5,316 

1,329 

NK 

1,479 

1,484 

1,556 

1,759 

6,278 

1,570 

NPK 

1,293 

1,538 

1,561 

1,639 

6,031 

1,508 

Total 

5,147 

5,901 

6,141 

6,357 

23,546 

1,472 


The following orthogonal comparisons can be made: 


Total yield 

Treatment 

N NP NK NPK 

5,921 5,316 6,278 6,031 

L 

L716 

L x = G 

+ + + + 

23,546 

34,650,882 

L 2 - P 

— + — + 

-852 

45,369 

L s = K 

+ + 

1,072 

71,824 

L 4 = PK 

+ - “ + 

358 

8,010 

: . ' r • 


125,203 


Treatments 
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The total sum of squares is 

8 V 2 = SY2 ~ ^ 2 /16 = 35,067,370 - 34,650,882 = 416,488. 
The block sum of squares is 

SSB = 34,859,185 - 34,650,882 = 208,303. 

Hence the error sum of squares is 

SSE = Sy* - SST - SSB = 82,982. 

The analysis-of-variance table is as follows: 


Source of 
variation 

Degrees of 
freedom 

Sum of 
squares 

Mean 

square 

Blocks 

3 

208,303 

69,434 

P 

1 

45,369 

45,369 

K 

1 

71,824 

71,824* 

PK 

1 

8,010 

8,010 

Error 

9 

82,982 

9,220 


* Significant at the 5 per cent probability level. 


Since the 5 per cent F value for 1 and 9 degrees of freedom is 5.12, we 
see that P is not quite significant, while K is significant. There is no 

indication of a real PK interac¬ 
tion. It would appear that added 
potash definitely increased the 
yield and that added phosphate 
possibly decreased the yield of 
oranges. In order to determine 
if there was a significant profit 
from the use of the added fertilizer, 
the cost of the fertilizer should be 
compared with the added returns 
from the oranges. These results 
are displayed graphically in Fig. 
20.2. There is some evidence of 
a downward slope, indicating the 
detrimental effect of phosphate. 
Since the two lines are nearly 
parallel, there is no indication of 



Fig. 20.2. Effects of different amounts of 
potash and phosphate on the yield of 
oranges. 


an interaction between phosphate and potash. 

20.4. The Alternative Analysis of a p X q Factorial. The general 
p X q factorial experiment can also be analyzed by the single-degree-of- 
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freedom approach of Sec. 20.3. Often the representatives of one or both 
sets of treatments are simply levels of that treatment.. For example, the 
treatments might be three levels of a nitrogen fertilizer (A) and three 
levels of a phosphate fertilizer (C), as in Exercise 20.4. An example of 
three fertilizer levels and two varieties is presented in Example 20.3. 
The experimenter should decide before the experiment is performed what 
comparisons should be made. It is not necessary to restrict oneself to 
orthogonal comparisons, except that most pertinent comparisons can be 
set up as one of a set of orthogonal comparisons. The significance proba¬ 
bilities presumably are less disturbed if each comparison is orthogonal to 
the others. Snedecor 4 (Chap. 15) discusses these single comparisons. 

A factorial experiment with more than two sets of treatments can also 
be analyzed by the single-degree-of-freedom approach. 

Matrix algebra techniques can be used to prove that the total treatment 
sum of squares obtained by the single-degree-of-freedom approach is the 
same as the treatment sum of squares of Chap. 18. 

If there are t treatment combinations and we set up t orthogonal com¬ 
parisons (such as the 4 Z/s in Sec. 20.3), 

i & = i ^ 

h = 1 * = 1 

where L f h 2 — L\/ [coefficient of rcr 2 in E(L 2 )] and Ti is the total yield for the 
iih treatment combination (based on r plots). We consider that tl^iere 
are t orthogonal forms 


a 



h - 1, 2, . . . , «, 


where J = (J J" \ ^ • If we solve for the T t in terms of the 

i = 1 

V h , we find that 

t 

Ti = ahiLh. 


Hence 


i 


rp 2 


-2G>« y 


-n« 

h i 


2 I4 2 + 


'ki^h 


V 

XXX ah ' iahi ^ i ' h, ^ j ' h = X 

h' i h = 1 
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Now if we let L[ = G/yft, as in See. 20.3, 

l Tf - CP/t = f L’ h \ 

i h = 2 

The variance of L' k is 

ff 2 K) = Q'hiO'K?<r{TiTir) = nr 2 , 


since <r(7k7V) 


0 for i 7^ i f yr\ 

V for f = f' and 2, < “ L 


l 

Example 20.3. An experiment on a small grain was conducted with 
two varieties 0i and v 2 ), three fertilizer levels (/i,/ 2 , and / 3 ), and six blocks. 
I he yields in pounds per plot were: 


F 

V 

Block 

Linear forms 



1 

2 

3 

4 

5 

6 

Total 

Ft 

F q 

V 

F{V 

F q V 

1 

1 

161 

166 

113 

103 

132 

180 

855 


+ 


+ 


1 

2 

192 

253 

208 

171 

196 

198 

1,218 

_ 

+ 

+ 


+ 

2 

1 

145 

231 

131 

158 

176 

216 

1,057 

0 

-2 


0 

2 

2 

2 

232 

231 

190 

171 

242 

238 

1,304 

0 

-2 

+ 

0 

-2 

3 

1 

172 

204 

104 

135 

178 

175 

968 

+ 

+ 




3 

2 

227 

214 

144 

146 

186 

230 

1,147 

+ 

+ 

+ 

+ 

+ 

Total 

1,129 

1,299 

890 

884 

1,110 

1,237 

6,549 

42 

-534 

789 

-184 

48 


In this case we need two linear forms to represent the fertilizer effects. 
In general the experimenter wants to know whether there is a consistent 
linear trend, as indicated by F h and whether there is a significant depar¬ 
ture from linearity—in this case indicated by the quadratic component, 
F q . Fi measures the change from the lowest to the highest level of ferti¬ 
lizer, while F q compares the sum of the yields at these two extreme levels 
against twice the middle yield. If the response of yield to fertilizer is 
linear, F q will not be significantly different from zero. The two inter¬ 
action forms are simply the respective products of Fi and F q with V. 
(FiV) represents the failure of the linear trend to be the same for the two 
varieties. That is, {F l V) = [(FV) Z2 - (FV) 12 ] - [(FV) n - (FF)u]. A 
similar statement can be made for (F q V). 
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The expected values of the squares of the linear forms are: 


Linear form 

E(L 2 ) 

L 2 /( coeff. o- 2 ) 

Fi = 42 

144(/ 3 - /,)« + 24<r 2 

73.50 

F q = -534 

144(f. - 2/ 2 +/ s ) ! + 72a 3 

3,960.50 

7 - 789 

324(^2 - ^i) 2 + 36cr 2 

17,292.25 

(FiV) - -184 

144 If + 24<r 2 

1,410.67 

(F q V) = 48 

144/ 2 + 72<r 2 

32.00 

SST = 22,768.92 

(A 32 — fvi 2 — fvn 

-hfvn), and 7 3 = (fv 32 - 2fv 22 

+ fv 12 — fv 31 + 2Ml 


We have used / and u for the true fertilizer and variety effects to avoid the difficulty of 
identification with Greek letters. 

The analysis of variance is: 


Source of 
variation 

Degrees of 
freedom 

Sum of 
squares 

Mean 

square 

Blocks 

5 

24,938.92 

4,987.78 

Fi 

1 

73.50 

73.50 

F q 

1 

3,960.50 

3,960.50** 

V 

1 

17,292.25 

17,292.25** 

(FiV) 

1 

1,410.67 

1,410.67 

(F q V) 

1 

32.00 

32.00 

Error 

25 

9,896.91 

395.88 


** Significant at the 1 per cent probability level. 

Apparently the effect of ferti¬ 
lizer was definitely nonlinear, with 
the maximum yield near the 
middle rate of application and 
the yield at the highest rate little 
more than the yield at the lowest 
rate. There was a real difference 
between the two varieties, but 
the fertilizer effects were about 
the same for the two varieties as 
shown by the nonsignificant inter¬ 
action. These statements are 
confirmed graphically in Fig. 20.3, 
where we see that both lines are 
almost parallel and that each shows a decline for the third level of ferti¬ 
lizer. The estimated efficiency of the randomized complete-blocks design 


+>220 

l 200 

£180 

c 

“160 
^ 140 L 



1 2 3 
Level of Fertilizer 

Fig. 20.3. Effects of different levels of fer¬ 
tilizer on the yields of two varieties of 
small grain. 
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as compared with a completely randomized design (neglecting the loss of 
error degrees of freedom) is 


30(395.88) + 24,938.92 
1 ~ 35(395.88) 


2.65, 


indicating an expected increase of about 165 per cent in the error variance 
if the completely randomized design had been used. 

If the experimenter were interested only in the over-all fertilizer effects, 
he would compute 


SSF = 


SS (FV) = 


F\ + F ! + FI 
12 

(Fvn, + • • 


G 2 

- b = 4 > 034 - 

+ t FV)l 


91 

36 


SSV - SSF = 1,442.67. 


20.5. The Analysis of a p X q Factorial Experiment with Unequal 
Numbers in the Subclasses. The factorial experiments in the previous 
sections of this chapter have had a fixed number of replications (r) per 
treatment combination. If the number of replications is not the same 
from one subclass to another, the analysis is much more complicated. 
Snedecor 4 shows that if the numbers are proportional to the main effect 
totals, the usual analysis can be carried out. That is, assume there are 
riij entries for the subclass with the ith A treatment and the jth C treat¬ 
ment and w*. = ^ nij, n.j = ^ 7i#, n = n^. If w# = n%.n.j/n i the least 

3 i i i 

squares equations show that the A, C, and AC effects are orthogonal and 
the usual analysis-of-variance methods apply. 

However, if the subclass numbers are not proportional, the A, C, and 
AC effects are confounded, requiring a complete least-squares solution of 
the type presented in Chap. 15. This solution is simplified if it can be 
assumed that there is no interaction, that is, the only effects are those for 
A and C. Several alternative computing procedures have been outlined 
by Yates, 7 Snedecor and Cox, 8 and Snedecor. 4 The simplest procedure 
is to compute the mean yield for each of the pq subclasses and to run 
an analysis of variance of these means. An estimate of the error vari¬ 
ance to test for A, C, and AC effects is obtained from the sum of squares 
within subclasses with n — pq degrees of freedom. First compute 
s 2 SS (within) /(n — pq). Since the first analysis is based on means, 
s 2 must be divided by the average number of entries per subclass. The 
average used is the harmonic mean, n kj where 




lit* 







FACTORIAL EXPERIMENTS 


279 


The appropriate error variance to use with the analysis of the means is 
s' 2 = s 2 /n h . 

The analysis of variance is: 


Source of 
variation 

Degrees of 
freedom 

Mean 

square 

A 

v - i 

MSA 

C 

q - 1 

MSC 

AC 

(p - 1 )(q - 1) 

MS (AC) 

Error 

n — pq 

s ' 2 = s 2 /rik 


If some of the subclasses have no entries (n<j = 0), this method of 
unweighted means cannot be used. Hence it has limited usefulness for the 
analysis of many social data, for which empty subclasses are common. 
However, if it can be used, the method has a minimum of computation 
and furnishes a short-cut procedure of testing for the existence of inter¬ 
actions. This is important information in applying other methods. 

If the nij are almost proportional, an approximate method, called the 
method of 'proportional subclass numbers, is probably better than the 
method of unweighted means. However, the authors doubt that the 
gain is often worth the extra computing time required. (See references 
4 and 8 for the computing details.) 

A third alternative is the method of weighted squares of means, advanced 
by Yates 7 and also described in references 4 and 8. This method provides 
exact tests of the main effects when interaction is present and for p = 2 
also provides an exact test of the interaction. 

The method of least squares furnishes an exact test for interactions 
regardless of the size of p and, if there is no interaction, an exact test of 
the main effects. First we assume no interaction and use the following 
model: 


Yijk == m + oti + yj + e'ijhi i = 2, . . . , p', j — 1,2, . . . , q; 

k = 1, 2, . . . , nij, 


where ai is the added effect of the fth A treatment, yj the added effect of 
the jth C treatment, and k the kth entry in the (i,j) subclass. In order 

P Q 


to have u = Y, we set ^ ni.ai -■ 0, ^ 

» = i y=i 

squares (SSE') for this model is compared 
of squares (SSE) for the complete model 


n.jjj = 0. The error sum of 
with the within subclass sum 


Y^k — u + ?ij + tijk, 


where m represents the effect of the (i,j) subclass. The error variance 
will be s 2 = BSE /{n — pq ). 
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The first model is used to determine m, Oj, and cy, using 
Yu — nu(m + a>i + cj), 

where ?)y is the predicted total yield for the (i,j) subclass with n i3 - entries 
and ^iii.ai = 0 = ^ w.yCy. The least-squares equations can be written 

i 3 

as follows: 


m: nm 


~b ^ fii.&i ~b ^ n.jCj = Gj 

i = 1 j = 1 

Q 

ap. ni.m + ni.ai + ) unci = A i9 
i = i 
p 

c 3 : n./m + ^ riijOi + n.jCj = Cj, 

i = i 

where and Cy are respective treatment totals. Since 

^ 'ft'i.&i — 0 == ^ fi.jCj, 
i 3 

m = Cr/w = F. In order to solve for the {a t -J and {cj}, a method of 
matrix inversion or the forward solution of the abbreviated Doolittle 
method can be used (see Chap. 15). 

However, a short-cut procedure is available if the inverse matrix is 
not wanted. This short cut is the same as the method of adjusting for 
block effects in an incomplete-blocks design (see Sec. 19.2) and is fre¬ 
quently called the sweep-out method. The cy are adjusted for the a* by 
multiplying the equation by (■ riij/ni .) and subtracting the sum of all 
these altered ai equations from the c 3 equation. The resultant equation is 

n jl C l "h Wy 2 C 2 H“ * * * + UjjCj + * * • + n'jqCq = Cj, 

where 

p 

u.j ^ n^/rij., 

_ } *“1 
W ■“ \ p 

^ nijUu!Tij., 


i = i, 

l 7 ^ j, 


i = 1 
P 


a - Cj 


^ TlijAi/fli. . 


t = 1 


These q equations in the {cyj can then be handled by the forward solu¬ 
tion of the abbreviated Doolittle method to determine SSC (adjusted for 

A) and the {cy}. Since ^ n.ycy = 0, the constants in the c q row will be 
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zero. Hence the C' q equation is often omitted and the effect c q from the 
other equations. However, it might be advisable to retain c q and the C' q 
equation as a check on the computations. The shortest computing time 
is achieved when p > q ; if p < q } reverse the A and C treatments in the 
analysis. 

The analysis of variance can now be set up. 


SSA (unadjusted for C) 



(P 

n 


and SSC (adjusted for A) is computed from the y column of the abbrevi¬ 
ated Doolittle computations. Hence the total sum of squares for A and 
C is 

SS(A + C) = SSA(unadj) + SSC(adj). 

The total sum of squares for all pq subclasses is 



G 2 


•; 


n 


where T i} - is the total yield in the (i,j) subclass. Hence the residual, due 
to interaction, is 


SS (AC) = 


zL/ Tiij 


l 


A: 


Ui 


SSC(adj). 


Also SSA (adjusted for C ) can be computed by subtraction as follows: 
SSA(adj) = SS(A + C) - SSC(unadj), 


where SSC(unadj) — 



3 

The analysis of variance is: 


n 


Source of 
variation 

Degrees of 
freedom 

Sum of 
squares 

Mean 

square 

A (unadj) 
C(adj) 

V - i 

q - 1 

SSC(adj) 

MSC 

C(unadj) 

A(adj) 

q - 1 

p - 1 

SSA (ad j) 

MSA 

AC 

C V - 1 ){q - 1) 

SS (AC) 

MS (AC) 

Error 

n — pq 

SSE 

s 2 
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Example 20.4. Some data were presented in Exercise 18.5 on the 
gain in weight of 149 rats for four successive generations, both male and 
female. 8 In order to test for generation and sex effects and the (genera¬ 
tion X sex) interaction, an analysis for unequal subclass numbers is 
needed. The analysis of variance for the unweighted means is as follows: 


Source of 
variation 

Degrees of 
freedom 

Mean 

square 

Generations 

3 

42.08 

Sex 

1 

6,394.67 

G X S 

3 

58.22 

Error 

141 

26.27 


In computing the error mean square, Ah = 15.576 and 
s 2 = 57,695/141 = 409.2. 


The only significant effect Is sex, but the F value for interaction of 2.22 is 
slightly larger than the F. i0 value for 3 and 141 degrees of freedom. 
Hence one might be hesitant about concluding there was no interaction; 
apparently the main differences were sex differences with the possibility 
that these differences were not the same from one generation to another. 

In order to make the test for interaction more exact, we present the 
least-squares solution. The least-squares equations, omitting the inter¬ 
action constants, are (a» for generation and Cj for sex effects): 


m: 

149m 

+ 

55 

CO 

+ 

40 a 2 

+ 

35a 3 

+ 

26a 4 

+ 

55ci 

+ 

94c 2 = 

19,537, 

a x : 

48m 

+ 

48cti 







+ 

21d 

+ 

27c 2 = 

6,673, 

a 2 : 

40m 



+ 

40 a 2 





+ 

15c x 

+ 

25c 2 = 

5,274, 

a 3 : 

35m 





+ 

35a 3 



+ 

12ci 

+ 

23c 2 = 

4,364, 

ct4: 

26m 







+ 

26«4 

+ 

7C! 

+ 

19c 2 = 

3,226, 

ci: 

55m 

+ 

21ai 

+ 

15a 2 

+ 

12a 3 

+ 

7 a 4 

+ 

55ci 


= 

9,203, 

c 2 : 

94m 

+ 

27ai 

+ 

25a 2 

+ 

23a 3 

+ 

19ct4 



+ 

94c 2 = 

10,334. 


The restrictions are 

48ai -(- 40a 2 ~b 35ct 3 T 26ct4 == 55ci -(- 94 c 2 = 0. 

The a effects are swept out of the c equations as follows: 


n\ 


'12 


55 - 


( 21 / 

48 


+ 


(7)_ 2 

26 


34.18860, 


n’t 1 = - 


(21) (27) (7) (19) 

48 ^ 26 


-34.18860, 
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»22 = 
C{ - 
C' t = 


94 - ^ 
9,203 - 
10,334 


(MV i . . . + M 2 ] = 34.18860, 

48 + ^ 26 J 

'(21) (6,673) , (V) (3,226) 

- -is + ' • ' + 26 ~ J 

r (27) (6,673) , , (19 )(3,226) 

- -4§ + ' • ' + 26 


1,941.045, 

= -1,941.045. 


Hence the adjusted c equations are 

34.18860(cq - c s ) = 1,941.045, 

— 34.18860(ci - ci) = -1,941.045. 

If we neglect the C 2 equation and c 2 (or set ci = c 1 - c 2 ), we have 


and 


SSC(adj) 

Also 

„ (6,673) 2 

SSA(unadj) = —— 

and 

... (9,203) 2 

SSC(unadj) = —gg— 


ci = 56.77463 

= ci(l,941.045) = 110,202.1. 

, ... + ff’ 226 i! - i' H '"~’ 37 '~ = 5,756.4, 

+ + 26 149 ’ 

i (10 ? 334) 2 _ (19,537) 2 _ 286 1 

94 149 ’ 


The total sum of squares for all eight subclasses was 


(3,716) 2 , , (2,029)_ 2 _ (19,537) 2 

-2I - ' + ‘ + 19 149 


119,141.0. 


Hence the interaction sum of squares is 

119,141.0 — 110,202.1 - 5,756.4 = 3,182.5. 

The analysis of variance is: 


Source of variation 

Degrees of 
freedom 

Mean square 

Generations 

3 


Sex (ad j) 

1 

110,202 

Sex 

1 


Generation (ad j) 

3 

557 

Sex X generation 

3 

1,061 

Error 

141 

409.2 
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The F value to test for interaction is 

F == 1,061/409.2 = 2.59, 

with 3 and 141 degrees of freedom, which is not quite significant at the 
5 per cent probability level (F. oi = 2.67). Note that the exact F is 
slightly larger than the F using the method of unweighted means. This 
is generally true, as the exact method is somewhat more powerful. 

f we conclude that there is no real interaction, we can use the error 
mean square (409.2) to test for the adjusted sex and generation effects 
showing a highly significant sex difference but no real difference in gains 
for the four generations. If we want a more exact test of over-all sex 
effects without neglecting the interaction than the test using the 
unweighted means, interaction constants can be inserted in the model and 
the mam effects adjusted for the interaction, or the method of weighted 
squares of means can be used. However, the average main effect has 
little meamng if interaction is present, because a real interaction indicates 
that the sex difference is not the same from one generation to another 
Hence it would appear better to test for sex difference in each generation 
using either the pooled s 2 or a separate a 2 for each generation. Pooling 
is justified only if the within subclass variance is constant from one sub¬ 
class to another We have assumed the constancy of these variances, 
but the reasonableness of the assumption should be checked. 

It might be mentioned that a significant interaction is often evidence 
of a multiplicative relationship. In this case if we analyzed Z = log Y 
the assumption of additivity would be more nearly correct. However’ 
it should be cautioned that if we analyze Z = log F, we must assume that 
the errors m Y also multiply so they become additive for Z. 

20.6. The Use of Incomplete-blocks Designs for Factorial Experi- 
ments As mentioned in Chap. 19, when all the treatments are not used 
m each block, the treatment and block effects become confounded. Since 
the factorial designs are constructed to obtain information on single 
comparisons, it seems reasonable to attempt to confound some of the 

fr!e ta r * n \ pans ° ns and leave the more important comparisons 
ee of block effects. For example, if we use a 2 X 2 X 2 factorial 
experiment andl only 4 plots are available per block, it would be desirable 
to confound the ACD three-factor interaction and leave the other 
effects free of block effects. 

Designate the 8 treatment combinations as 

111 , 211 , 121 , 221 , 112 , 212 , 122 , 222 , 

where the first number refers to the A treatment (1 or 2), the second to the 
C treatment, and the third to the D treatment. If we put treatments 
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(111, 221, 212, 122) in one block and (211, 121, 112, 222) in the second 
block, the main effects and two factor interactions will be clear of block 
effects and ACD will be completely confounded with block effects. How¬ 
ever, if we repeat the experiment several times so that we have v repli¬ 
cations (2 r blocks), a test can be made oi ACD, using methods to be dis¬ 
cussed in Chap. 23 for split-plot designs. We shall defer a discussion of 
this point until then and simply indicate the analysis-of-variance degrees 
of freedom as follows: 


Source of 

Degrees of 

Variation 

Freedom 

Blocks 

2r - 1 

A 

1 

C 

1 

D 

1 

AC 

1 

AD 

1 

CD 

1 

Error 

6(r - 1) 


The general procedure of confounding in factorial experiments is 
extensively discussed in references 1 to 3. The reader is encouraged to 
read these references if he is interested in setting up factorial experiments 
with more treatment conbinations than plots per block. Some exercises 
are included at the end of the chapter. 

20.7. Construction of Experimental Designs. In all of the theoretical 
discussions in this and previous chapters, we have assumed that the design 
was known and have proceeded to set up an analysis for this design. 
The reader might be interested in knowing how to construct designs in the 
first place so that the analysis will be relatively simple. 

(а) There is no difficulty in setting up completely randomized or 
randomized complete blocks or complete Latin-square designs, except to 
remember that randomization is necessary. 

(б) In planning confounded factorial designs, the main principle is to 
restrict the confounding to high-order interactions, if this is possible. 
One principle must always be remembered with 2 n designs: if two inter¬ 
actions are confounded, then a third effect is also confounded this third 
effect is formed by casting out all like letters in the first two. For 
example, if ABC and ABD are confounded, CD will be also. Hence it is 
often necessary to adjust the confounding so as to protect main effects 
and two-factor interactions. For more than two levels, the principles of 
confounding are much more complicated. We shall present a few exam¬ 
ples of how to construct confounded factorials (see Yates 3 for the details 
of other examples). 
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Example 20.5. If a design is of the 2 n character (n factors each at two 
levels), the construction of a confounded design is simplified by the use 
of the (+, —) system. For example if we wish to use 2 4 ( = 16) treatments 
in 4 blocks of 4 treatments each, we are led to confound 3 degrees of 
freedom (the number of degrees of freedom between blocks in a repli¬ 
cation) . Let us designate the 4 factors as A, B, C, and D, with 1 standing 
for the low level and 2 the high level of a factor. We cannot confound 
on ABCD , because if we select any 3-factor interaction, we shall auto¬ 
matically confound a main effect by the above rule. Hence we consider 
confounding two of the four 3-factor interactions and one 2-factor inter¬ 
action, for example, ABC, ABD, and CD. We set up the (+, — ) 
system for each of these three effects as in Table 20.1. We note that 


Table 20.1 


Treatment 

ABC ABD CD 

Block 

12 3 4 

1111 

+ 

X 

2111 

+ + + 

X 

1211 

+ T + 

X 

2211 

+ 

X 

1121 

+ 

X 

2121 

+ 

X 

1221 

+ 

X 

2221 


X 

1112 

+ 

X 

2112 

+ 

X 

1212 

+ 

X 

2212 

+ 

X 

1122 

+ + + 

X 

2122 

+ 

X 

1222 

+ 

X 

2222 

+ + + 

X 


there is a + for the treatment having the high level of all the factors in a 
particular interaction, — for one low level and the remainder high levels, 
etc. Also, CD = (ABC)(ABD). The treatments with the same combi¬ 
nation of (+, —) signs are assigned to the same block. This method can 
be used for all 2 n designs. 

Example 20.6. Consider a 3 2 design with the levels designated as 
1, 2, and 3. The 9 treatments can be arranged in a 3 X 3 table. 


11 

12 

13 

21 

22 

23 

31 

32 

33 
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The AB interaction (4 degrees of freedom) can be split into two com¬ 
ponents, each with 2 degrees of freedom. We compute 

Jo = (11 4- 22 -f 33), h = (12 + 23 + 31), J, = (13 + 21 + 32). 
Jo = (11 -f 23 -f 32), J i = (12 + 21 + 33), J% = (13 + 22 + 31). 


The two components are 


I 



G 2 
9 ' 


J 



( rl 

9 ' 


If we wanted to use only 3 treatments per block, we could confound 
either the I or the J part of the AB interaction by planting the I or J 
combination of treatments in blocks. We note that these combinations 
do not confound the main effects, since all 3 levels of each factor arc 
present in each combination. (See reference 3 for the rules when 
n > 2.) 

20.8. Summary. We have presented only an introduction to factorial 
designs in this chapter, but it is hoped that the reader has been able to see 
the usefulness of these designs. Yates 3 and Cochran and Cox 1 have an 
extensive amount of material on factorial designs, but even they do not 
present an exhaustive list; more needs to be done in developing con¬ 
founded factorial designs for the engineers and the physical scientists, 
who require three or more levels of each treatment and often more sets 
of treatments than are used by the natural scientists. Too little atten¬ 
tion has been paid to encouraging experimenters to use three or more 
levels of each treatment in an experiment in order to discover whether 
or not there is curvilinearity in the response curves. The 2 n factorials 
have been used because of their simplicity, but they furnish little infor¬ 
mation on the response surface. 

Some work has been done on fractional replication 1,11-14 in which only 
part of the treatment combinations are used. Fractional replication is 
needed in industrial experimentation, especially in destructive testing. 
Too little attention has been paid to the development of these designs, 
especially for mixed series, such as a 2 X 3 X 4 experiment. 


EXERCISES 

20.1. (a) In Sec. 20.3, show that 

E(L*)/4r = E{ SSA), E{L\)/Ar = E( SSC), E(Lf)/4r = E[SB(AC)). 

Hint: Remember that a\ + « 2 = 0, yi + 72 = 0, etc. 
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(b) In Example 20.3, show that 


E 


Ff F 2 

i _i_ i_s 

24 ^ 72 


= E(SSF). 


20.2. (a) In Example 20.1, what was the estimated efficiency of the 
randomized complete-blocks design as compared with a completely 
randomized design? Would you consider this to be significantly greater 
than 1? 

(6) Reanalyze the data, using the following orthogonal sets: Vi - V 2 ; 
Vi + V 2 — 2Vz ; N 2 — N i; 2Nz — (Ni + N 2 ); interactions of the V 
and N comparisons. What does this analysis reveal? 

20.3. An experiment was designed to compare five varieties of cowpeas, 
at three different spacings, 4, 8, and 12 inches apart in row, with rows 
3 feet apart. 5 For the data, see the accompanying table. 


Data op Exercise 20.3 

Yield of Cowpea Hay (Pounds per T ^ Morgen Plot) 


Variety X spacings 

Blocks 

Total 

Subtotal 

I 

II 

III 

IV' 

New Era 

4" 

56 

45 

43 

46 

190 



8" 

60 

50 

45 

48 

203 

616 


12" 

66 

57 

50 

50 

223 


34 C 361 

4" 

61 

58 

55 

56 

230 



8" 

60 

59 

54 

54 

227 

674 


12" 

59 

55 

51 

52 

217 


34 C 395 

4" 

63 

53 

49 

48 

213 



8" 

65 

56 

50 

50 

221 

665 


12" 

66 

58 

52 

55 

231 


34 C 402 

4" 

65 

61 

60 

63 

249 



8" 

60 

58 

56 

60 

234 

692 


12" 

53 

53 

48 

55 

209 


34 C 408 

4" 

60 

61 

50 

53 

224 



8" 

62 

68 

67 

60 

257 

773 


12" 

73 

77 

77 

65 

292 


Block totals 

929 

869 

807 

815 

3,420 



(a) Set up a table of means. 

(b) Set up the analysis of variance. 

(c) Derive the linear and quadratic components of the spacing effect 







FACTORIAL EXPERIMENTS 


289 


and the interaction. Graph these results, using spacing as the X variate 
and drawing separate lines for each variety. 

(d) What is the standard error to compare any 2 of the 15 treatment 

means? 

(e) Discuss the results. 

20.4. A fertility test was made on the growth of grass on Philadelphia 
Flat soils in the Manti National Forest with three levels of nitrogen (N) 
and three levels of phosphate (P) with two samples of each treatment. 
For the data, see the accompanying table. Note that this is not a 


Data of Exercise 20.4 



Grams of grass 


Total 


No 

Ai 

n 2 

Po 

18.7 

17.5 

20.8 

20.5 

22.3 

22.9 



36.2 

41.3 

45.2 

122.7 

Pi 

19.2 

21 .3 

18.8 

23.5 

24.9 

24.2 



40.5 

42.3 

49.1 

131.9 

p 2 

20.8 

20.5 

22.0 

24.0 

25.6 

27.1 

140.0 


41.3 

46.0 

52.7 

Total 

~~ 118.0 

~~129.6 

147.0 

394.6 


randomized-blocks experiment but is analyzed as a completely randomized 
design of the type discussed in Sec. 18.2. 

(а) Set up the table of means, and make the analysis of variance. 

(б) Show that the only important effects are the linear for both N 
and P. 

(c) Draw a graph similar to Fig. 20.2, and discuss the results. _ 

20.5. Suppose you wish to set up an experiment to test the effectiveness 
of 2 levels of nitrogen, 2 levels of phosphate, and 2 levels of potash on the 
yield of potatoes and had enough land to plant 80 plots. 

(а) Show how you would set up this experiment. 

(б) Set up the analysis-of-variance table. 

(c) Indicate what kind of information can be obtained from such an 
experiment. 

(d) If you wanted to know something about the maximum level of the 
three fertilizers to use, what changes have to be made in planning another 
experiment ? 
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(e) How would you take account of the cost of the fertilizers in making 
your recommendations to the farmer? 

20.6. A 6 X 6 Latin-square experiment was run at the North Carolina 
Agricultural Experiment Station to determine the effect of nitrogen and 
phosphate fertilizers on potato yields. The following treatments were 
used: 


Nitrogen 

Phosphate 

Low 

Medium 

High 


Low 

A 

B 

C 

High 

D 

E 

F 


The field arrangement and yields (pounds per plot) were: 


E 

633 

B 

527 

F 

652 

A 

390 

C 

504 

D 

416 

Row totals 
3,122 

B 

C 

D 

E 

F 

A 


489 

475 

415 

488 

571 

282 

2,720 

A 

E 

C 

B 

D 

F 


384 

481 

483 

422 

334 

646 

2,750 

F 

D 

E 

C 

A 

B 


620 

448 

505 

439 

323 

384 

2,719 

D 

A 

B 

F 

E 

C 


452 

432 

411 

617 

594 

466 

2,972 

C 

F 

A 

D 

B 

E 


500 

505 

259 

366 

326 

420 

2,376 

Col. totals 3,078 

2,868 

2,725 

2,722 

2,652 

2,614 

16,659 


(a) Make a complete analysis of this experiment, indicating the treat¬ 
ment means and effects and linear and quadratic effects. Graph the 
results with nitrogen as the X variate and using separate lines for each 
phosphate. 

Q>) What recommendations should be made if it costs $2 per plot extra 
for high instead of low nitrogen and $1 per plot extra for each jump in the 
amount of phosphate and potatoes sold for $.03 a pound? 

(c) Was the Latin square better than a randomized complete-blocks 
design ? 

20.7. An experiment was conducted by C. H. Li to study the effect of 
electrolytic chromium plate as a source for the chromium impregnation of 
low-carbon steel wire. 6 Eighteen treatments were considered, using all 
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combinations of three diffusion temperatures (2200°F, 2350°F, 2500°F), 
three diffusion times (4, 8, and 12 hours), and two degassing treatments 
fno degassing (0) and degassing (1)]. Each treatment was used on 4 
wires giving a total of 72 wires used. The variable studied was average 
resistivity in microhms per cubic centimeter. The average resistivities 
are shown in the accompanying table. 


Data of Exercise 20.7 
Temperature 2200° _ ^350 

Degassing 0_1_9_ ^ 

4 hours 


18.1 

17.9 

22.1 

21.2 

22.9 

22.8 

18.9 

18.0 

20.2 

20.4 

24.0 

22.3 

18.6 

18.7 

21.3 

21.2 

23.0 

22.7 

19.1 

19.0 

22.6 

21.2 

23.0 

23.3 

74.7 

73.6 

86.2 

84.0 

92.9 

91.1 



8 hours 



19.2 

19.2 

23.2 

22.7 

25.5 

26.9 

19.3 

19.0 

21.8 

22.7 

26.6 

26.9 

20.7 

20.4 

22.9 

22.5 

25.9 

26.3 

20.4 

19.2 

22.3 

22.5 . 

26,8 

26.9 

79.6 

77.8 

90.2 

90.4 

104.8 

107.0 



12 hours 



20.0 

19.9 

23.9 

23.3 

27.0 

26.5 

20.2 

20.1 

23.6 

23.5 

26.2 

26.8 

20.1 

20.0 

23.2 

23.5 

25.9 

25.4 

20.5 

20.8 

23.7 

22.9 

26.9 

27.2 

80.8 

80.8 

94.4 

93.2 

‘~10670~ 

10579 


(a) Set up a table of means for the 18 treatment combinations and for 

each of the main effects. 

(I b ) Make an analysis of variance with single degrees ol treeclom 

(. Sy 2 = 524.2550). . 0 , 

( c ) Determine the standard error for the difference between any 2 ol 

the 18 treatments. 

(d) What are your conclusions regarding these treatments? 

20.8. A randomized blocks experiment was set up to test the perform¬ 
ance of 16 treatment combinations in two different blocks, each block 
having 4 rows with 4 treatments per row. The original setup was a 
factorial experiment. It turned out that only 2 of the mam treatments, 
designated as L and 0, were important. It also became evidentthat 
blocks were badly placed, since there were marked differences m fertility 
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between the 4 rows, which ran across both blocks. In the analysis of the 
difference between the effects of L and 0, we are confronted with the 
difficulty that L and 0 did not each appear twice in each row of each block 
but sometimes L would appear 3 times and 0 once, or vice versa. The 
yields are presented below: 



Block I 

Block II 

Total 

Row 

L 

0 

L 

0 

L 

0 

L + 0 

1 

84, 70, 

81 66 

63, 97 

56, 64 

395 

186 

581 

2 

146, 171 

148, 137 

189 

168, 158, 152 

506 

763 

1,269 

3 

247 

179, 218, 228 

195, 189 

191, 179 

631 

995 

1,626 

4 

177, 153 

123, 166 

145,141, 130 

133 

746 

422 

T168 

Total 

1,129 

1,265 

1,149 

1,101 

2,278 

2,366 

4,644 


2,394 

2, 

250 



0) Set up the least-squares equations to test whether or not there was 
a real difference between L and 0 after adjusting for row effects and 
neglecting the other treatments. 

(6) Show that the following analysis of variance is obtained: 


Source of Variation 

Blocks 

Rows 

Treatments (adj.) 
Error 


Sum of Squares 
648.00 
70,542.25 
1,313.21 
8,006.04 


(c) Fill m the appropriate degrees of freedom in (6), and make a test of 
the treatment differences. 

(d) Is there any feature of the error which might cause you to doubt 
the validity of the analysis? 

20.9. Use the method presented in Sec. 15.3 (page 198) to solve for 
the values of m, (a,), and ! c,-j in Example 20.4. 

20.10. ' C. B. Ratchford 9 investigated the differences in average man 
work units per farm for 114 nontractor farms in the coastal plains counties 
of North Carolina (1949). He studied three factors which might have 
influenced the number of man work units per farm: size of farm (small, 
medium, or large), type of farming (tobacco or general), and three types 
o rental arrangements. The total man work units were as shown in the 
accompanying table (number of farms in parentheses). The total sum 
of squares within classes was 1,201,641. 











FACTORIAL EXPERIMENTS 


293 


Data of Exercise 20.10 


Type of 

Size of 

Type of rental arrangement 

■ in , , , , ,H__ 

Total 

farming 

farm 

1 

2 

3 

Tobacco 

Small 

Medium 

Large 

Total 

7,770.7(26) 

2.492.5 (8) 

1.654.6 (4) 
11,917.8(38) 

2,837.1 (9) 
1,239.6 (3) 
403.3 (1) 
4,480.0(13) 

5,410.7(19) 
359.6 (1) 
(0) 

5,770.3(20) 

16,018.5 (54) 
4,091.7 (12) 
2,057.9 (5) 

22,168.1 (71) 

General 

Small 

Medium 

Large 

Total 

2,081.1 (9) 
1,256.6 (4) 
680.8 (2) 
4,018.5(15) 

1,443.7 (4) 
1,179.0 (3) 
958.7 (2) 
3,581.4 (9) 

2,252.3(13) 
2,142.7 (5) 
361.1 (1) 
4,756.1(19) 

5,777.1 (26) 
4,578.3 (12) 
2,000.6 (5) 
12,356.0 (43) 

Grand total 

15,936.3(53) 

8,061.4(22) 

10,526.4(39) 

34,524.1(114) 


(a) Derive the total sum of squares due to the main effects of size, type, 
and rental arrangement, not adjusted for interaction. Write the variables 
in this order in the matrix: rental arrangement, type, size. Sweep out the 
rental constants, and solve for the type and size constants: type (adjusted 
for rental) and size (adjusted for rental and type). 

(b) Derive the interaction sum of squares, and show that there was no 
significant interaction. What does this tell you about the separate two- 
factor and three-factor interactions? 

(c) Test the effect of size adjusted for rental arrangement and type of 
farming. 

(d) How would you go about testing the other two adjusted main 
effects? 

20 . 11 . Read an article by Anderson and Manning 10 for a more complete 
discussion of matrix methods with unequal frequencies. 

20.12. Use the observation equations for each of the eight subclasses to 
set up the normal equations in Example 20.4 and Exercise 20.10. 

20.13. Show that the method of least squares produces exact tests for 
the main effects, even when interaction is present, if n {j = ni.n.j/n. 

20.14. Suppose you studied only small and medium-sized farms in 
Exercise 20.10. 

(а) Make an analysis by the method of unweighted means in this case. 

(б) Does this analysis give you any clue to the separate interactions of 
Exercise 20.10? 

(c) Why could you not use this method for all farms? 

20.15. Given the following data on the number of plants emerging in an 
experiment with two levels each of nitrogen (N), phosphate (P), and 
potash (K). Four treatments were used per block, with a total of 8 
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blocks (4 replications), and the NPK interaction was confounded. The 
stands were as follows: 


NPK 

la 

2 a 

3 a 

4 a 

Total 

211 

31 

30 

33 

28 

122 

121 

25 

24 

30 

19 

98 

112 

21 

21 

30 

24 

96 

222 

66 

39 

41 

36 

182 

Total 

143 

114 

134 

107 

498 


NPK 

16 

26 

36 

46 

Total 

111 

11 

7 

19 

13 

50 

212 

33 

31 

36 

31 

131 

122 

29 

27 

31 

26 

113 

221 

43 

39 

36 

35 

153 

Total 

116 

104 

122 

105 

447 


(a) Show that the following analysis is correct: 


Source of 
variation 

Degrees of 
freedom 

Mean 

square 

Blocks 

7 

50.10 

N 

1 

1,667.53 

P 

1 

675.28 

K 

1 

306.28 

NP 

1 

9.03 

NK 

1 

16.53 

PK 

1 

3.78 

Error 

18 

32.05 


( b ) Test the significance of the main effects. What about the inter¬ 
actions? 

(c) Show that the main effects and two-factor interactions are not 
confounded with block effects and that NPK is completely confounded. 

20.16. Two different cultivation methods were also used in the experi¬ 
ment presented in Example 20.3, giving a total of 12 treatments, but only 
6 treatments were planted per block. The treatment effects can be 
divided as follows, with the number of degrees of freedom per effect in 
parentheses: F( 2), 7(1), F7(2), C( 1), FC( 2), 7C(1), F7C(2). The two 
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cultivation methods (ci and c 2 ) were assigned in the following order for 
each block (see the example for F and V): 


F 

V 

Block 

1 

2 

3 

4 

5 

6 

1 

1 

2 

1 

1 

2 

1 

2 

1 

2 

1 

2 

2 

1 

2 

1 

2 

1 

1 

2 

2 

1 

1 

2 

2 

2 

2 

1 

1 

2 

2 

1 

3 

1 

1 

2 

1 

2 

2 

1 

3 

2 

2 

1 

2 

1 

1 

2 


We note that there are three complete replications of the 12 treatments. 
The total yields of three plots for each of the 12 treatments were: „ 


Treatment 

: 111 

! 

112 

121 

122 

211 

212 

221 

222 

311 

312 

321 

322 

Total 

Yield 

; 411 

444 

561 

657 

479 

578 

659 

645 

451 

517 

546 

601 

6,549 


(а) Check that the above treatment totals are correct. 

(б) Write out 11 treatment effects, using Fi and F q as before. Show 
that 8 of these are independent of block effects, and show that the sum of 
squares attributable to these 8 effects is 25,977.84. 

(c) Show that VC, F t VC, and F q VC are not independent of block 
effects; these effects are confounded with block effects. 

(d) Show that the sums of squares for the effects in (c) adjusted for 
blocks are 40.5, 154.1, and 129.6, respectively. 

(e) Compute the new error mean square, and make the necessary tests 
of significance. 
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CHAPTER 21 

THE ANALYSIS OF COVARIANCE 


21.1. Introduction. In Chaps. 18 to 20, we considered various types 
of experimental designs to estimate treatment effects and to test for differ¬ 
ences among these treatments. Frequently the experimenter wishes to 
make these estimates and tests on some dependent variate after adjusting 
for the effects of one or more fixed variates. For example, he might wish 
to test the effectiveness of various rations on the gains in weight of hogs 
after adjustments have been made for the initial weights of the hogs. 
Other fixed variates may be used in order to study the effect of the rations 
adjusted for the effects of such external factors as temperature of the pens 
or sunlight. An agronomic experiment might be improved by adjusting 
crop yield for weather and soil conditions or for unequal stand. In an 
educational experiment to test for differences between teaching methods, 
it might be advisable to adjust the results for the mental age or for some 
test score secured before the experiment started. Cochran has discussed 
the theoretical and practical aspects of the analysis of covariance in 
references 1 and 2. R. A. Fisher first introduced the method in reference 
3. 

21.2. The Use of Simple Covariance for a Randomized Complete-blocks 
Experiment 

(i) We shall not attempt to present the theory of covariance for all of 
the experimental designs of Chap. 18, since it is believed that the method 
can be adequately demonstrated with the randomized complete-blocks 
type of experiment. As in Sec. 18.3, we shall assume that the experi¬ 
menter has p treatments, each assigned at random to an individual plot in 
each of r blocks. The variate to be estimated is designated as Y, and the 
fixed variate for which Y is to be adjusted is designated as X . The 
experimental model for the yield of the i %h treatment in thejth block is 

Yu = n + rf ff- /3* + fan + dj = m + tf + bf + bxij + e ih 

where x# ~ — X, and r* and /3* are the treatment and block effects 

adjusted for the effect of the X’s. Since Sx — 0, it is obvious that 
p* — p; hence, we shall use p and its estimate, m, in the theory which 
follows. As usual 

2 -* - 2 >* = °- 

i 3 
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The X's are assumed to be fixed, and hence not influenced by the treat¬ 
ments. In case X has some effect on the yield (F) and the experimenter 
was not able to maintain the same value of X for all treatments, it is 
desirable to estimate the treatment effects after the yields have been 
adjusted for the effect of X . If the values of X are actually influenced 
by the treatments but can be measured without error, an analysis of 
covariance can still be run but interpretations are often quite difficult. 
As indicated in Chap. 15, if we estimate y, rf , /3f, and /3 by the method of 
least squares, each of the first three estimates will be adjusted for the 
linear effect of X. It should be emphasized at this point that we are here 
considering only the linear effect of X ; however, any other function of X 
could be used as the fixed variate, and we might consider several fixed 
variates in a multiple covariance. 

(ii). The least-squares equations for m, tf, bf, and 6, respectively, are 

rpm = SYj 

rm + rtf + bx { . = T { , 

pm + pbf + bx.j = B j} 

2 tfxt. + 2 bfx.j + bSx* = SxY = Sxy, 

i 3 

where x t . = Sx u = X.-. - rX, x.j = Sxy = X. } - pX, and T t and B, 

3 i 

were defined in Bee. 18.3. In other words, x { . and x.j are treatment and 
block sums for x. It is seen that ^ x { . = ^ x.j = 0. 

i 3 

The solutions to the least-squares equations are 



3 i 


tf = (fi — Y) — bxi. = L — bxj., 
bf = (Bj - Y) - bx.j = bj - bx.j, 

where E xx and E xy are similar to BSE but applied to x 2 and xy, X{. = Xi.jr , 
x.j — x.j/p, and similarly for T and B. 

We see that an adjusted treatment effect (/*) is estimated by subtract¬ 
ing an adjustment factor from the unadjusted effect (t). The adjustment 
factor, bx^, is simply the average change in Y for a unit change in X 
multiplied by the difference between the treatment mean of X and 
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X(x.i. = Xi. - X). Hence each treatment effect is adjusted to the aver¬ 
age effect if all treatments had operated with the mean value of X. 

It can be shown that 


E xy = 8(xij - x.j - Xi.)(yij - Bj 
E xx = S(xij - x.j - Xi.) 2 . 


fi): 


(iii) The above estimates can be put in terms of the parameters and 
(e#} as follows: 

Y = i* + e, 

Vij = rf + Pf + fan 4 €ij - i, 

Bj = n 4 Pf 4 px.j 4 
= M 4 rf 4 (3Xi. 4 ei., 

Vij — x.j — Xi,) (e-ij — e.j — €{. — e) 


6 = 4 = + s(a * 

Exx 


= p 


8 (xij x.j Xi.) Cj 


4* = rf 4 e<. 

6* 


I 8f 4 e.y 


£4 

e — ^i.eb, 
e — x.j€b, 


E xx 

P 4 €&, 


where e = Se/rp, e.j — Seij/p , and e*. — Sejj/r. 

Hence F, 6, 4*, and are unbiased estimates of p, P, rf, and pf, 
respectively. 

An unbiased estimate of an adjusted treatment mean is 


1*4? = rf 4 M 4 ei. - Si. 


■€&. 


Similarly the estimate of the difference, 5*, between two treatment means 
is 

d* = t* - tf = (rf - rf) + (ii- - h.) - (%■ - x,.)i h . 

The variance of the difference between two treatment means is 

a\d*) = ff 2 jj 


(Xi. — Xi .) 2 

~wz. 


In computing this variance, we use Xj. — Xi. — Xj. xi. . It might be 
noted that in computing this variance, we computed <r 2 (6) = (r 2 /E xx . 

(iv) From least-squares theory, we know that the residual sum of 
squares is given by 

SSE* = Sy* - ^ tfTi - ^ bfB, - bSxy 

i 3 

= Sy*-\ [J Tf -rC]-- v [£ Bf - V c] - bB m 
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where C = (SY) 2 /rp and Sy 2 = SY 2 - C. But this is the usual error 
sum of squares for the analysis of variance (Bee. 18.3), 

SSE = Sy 2 - SSB - SST, 

minus the reduction due to regression when x and y are adjusted for block 
and treatment effects. We see that 


Yij — Y + t* + b* + bxij = y + r* 
Hence the residual is 


+ Sf + + e*. + e.j — e 

“f" ^b(Xij X%, X.j). 


e ij ~ — Yij — (eij — €{. — e.j + e) — e b (xij — Xj. — X.j). 

The expected value of SSE* = Sefj is 

(r - 1 )(p - IV 2 - cr 2 = [(r - l)(p - 1) - l]cr 2 . 


Therefore s* 2 = SSE*/[(r — l)(p — 1) — 1] is an unbiased estimate of 
c- 2 .f This estimated variance can be used to set up confidence limits for 
differences between adjusted treatment means. 

The added reduction due to treatments is found by omitting the treat¬ 
ment constants from the model equation. The new residual variance is 


(SSE*)' = Sy 2 - SSB - b'E f xy , 

where 


7/ = _ S{xjj x.j ) (yij Bj) 

Kx S(Xij - x.j) 2 

This is simply SSE'( = $?/ 2 — SSB) minus the reduction due to regression 
when x and y are adjusted for block effects only. The added reduction 
due to treatments is 

SST* = (SSE*)' - SSE*. 

Also, 

[j — Y + bf + b'xij, 

e ij = T i + (eij — e.j + e) — e' b (xij — x.j), 

where e' b — S(xjj — x.j)^/S(x { j — x.j) 2 . Hence the expected value of 
(SSE*)' is 

<V) 2 + Kp - 1) - lk 2 . 


Since SST* = (SSE*)' — SSE*, the expected value of SST* is 

Thirty + (p - lk 2 . 


t s* 2 is usually designated as Sy. a . 
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Hence 


E( MST*) 



(p ~ 1) 


+ 0 - 2 . 


Under the null hypothesis {r* ==0), SST* is distributed as %V 2 with 
(p — 1) degrees of freedom and independently of SSE*, which is dis¬ 
tributed as xV 2 with (rp — r — p) degrees of freedom. Hence 


F « SST V(P “ 1) = MST* 

SSE */(rp — r — p) s* 2 ’ 

with (p — 1, rp — r — p) degrees of freedom, can be used to test the 
above null hypothesis. 

(v) The analysis of covariance table is: 


Source 

Degrees of 

Original sums 

Adjusted results 

of variation 

freedom 

y 2 xy x 2 

d.f. SS MS 

Total 

Blocks 

Treatments 

rp — 1 
r — 1 

p - 1 

Sy 2 Sxy Sx 2 
SSB B xy B xx 
SST T xy T xx 


Error 

(Error)' 

(r - 1 )(p - 1) 
r(p - 1) 

SSE E xy E xx 
SSE' E' xy E f xx 

(r - l)(p - 1) - 1 SSE* s*2 

r(p - 1) - 1 (SSE*)' 


Adjusted treatments = (error)' — (error) p — 1 SST* MST* 


In this analysis, 


(Error)' = error + treatments, 


2 BjX.j 

Bxy = v ^ BjX.j = 3 —^—■ - 

j 

£ TiXi. 

T xu = r ^ TiXi. = -—-- 


(> SY)(SX) 
rp ’ 

(SY)(SX) 
rp f 


and similarly B xx and T xx are block and treatment sums of squares for x . 
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(vi) The analysis of covariance has two main uses: 

(a) To reduce the error variance by eliminating the plot-to-plot 
variation attributable to fluctuations in the fixed variate, 

(b) To eliminate any bias in treatment comparisons caused 
by an uneven distribution of the fixed variate to the various 
treatments. 


The efficiency (I) of covariance in reducing the error variance is 
given by the ratio of the variance of the difference between two unad¬ 
justed treatment means (2s 2 /r) to the average variance of the difference 
between two adjusted treatment means.f Finney 4 shows that the aver¬ 
age variance of the difference between two adjusted means is 


Hence 


$ 2 (d*) = 


2s 


*2 


1 + 


(P “ 


I 


,*2 


1 + 


(P “ 1)^ 


It is rather difficult to assess the effectiveness of covariance in elimi¬ 
nating the effect of X on the treatment means. It has sometimes 
been stated that one might test for treatment differences in X by use of 
F x = (r — 1)T XX /E XX . If F x is not significant, the experimenter is told 
that he can attribute adjusted yield differences to the treatments; how¬ 
ever, if F x is significant, he is advised to be cautious, because adjusted 
yield differences might be attributed to differences in X. Actually the 
experimenter should consider whether treatment differences in X were 
inherent in the treatments (such as poor germination resulting in a low 
stand) or were the result of external circumstances. If the latter is the 
case, then covariance should be used to eliminate a bias in estimating 
treatment differences. But if the treatments actually produce differ¬ 
ences in X, the experimenter should take this into account in making 
recommendations. 

Example 21.1. Snedecor 6 presents an example of the analysis of 
covariance of the yield of sugar beets (F) adjusted for stand (X). The 
yields in tons per acre and the stand in numbers of beets per plot are 
presented in Table 21.1. 

f Since there is a loss of only 1 degree of freedom in adjusting for X, we need not 
worry about this feature unless there are very few degrees of freedom for error. 
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Table 21.1 


Fertilizer 

applied 

(0 

Stand 

and 

yield 

Block (j) 

Treatment sums 
and means 

1 

2 

3 

4 

5 

6 

Xi./Ti 

Xi./Ti 

None 

1 

Stand 

183 

176 

291 

254 

225 

249 

1,378 

229.7 



Yield 

2.45 

2.25 

4.38 

4.35 

3.42 

3.27 

20.12 

3.353 

Superphos- 

2 

Stand 

356 

300 

301 

271 

288 

258 

1,774 

295.7 

phate, P 


Yield 

6.71 

5.44 

4.92 

5.23 

6.74 

4.74 

33.78 

5.630 

Muriate of 

3 

Stand 

224 

258 

244 

217 

192 

236 

1,371 

228.5 

potash, K 


Yield 

3.22 

4.14 

2.32 

4.42 

3.28 

4.00 

21.38 

3.563 

P + K 

4 

Stand 

329 

283 

308 

326 

318 

318 

1,882 

313.7 



Yield 

6.34 

5.44 

5.22 

8.00 

6.96 

6.96 

38.92 

6,487 

P + sodium 

5 

Stand 

371 

354 

352 

331 

290 

410 

2,108 

351.3 

nitrate, N 


Yield 

6.48 

7.11 

5.88 

7.54 

6.61 

8.86 

42.48 

7.080 

K + N 

6 

Stand 

230 

221 

237 

193 

247 

250 

1,378 

229.7 



Yield 

3.70 

3.24 

2.82 

2.15 

5.19 

4.13 

21 .23 

3.538 

P + K + N 

7 

Stand 

322 

367 

400 

333 

314 

385 

2,121 

353.5 



Yield 

6.10 

7.68 

7.37 

7.83 

7.75 

7.39 

44.12 

7.353 

Block sums 


'X.i 

2,015 

1,959 

2,133 

1,925 

1,874 

2,106 

12,012 

286.0 



Bi 

35.00 

35.30 

32.91 

39.52 

39.95 

39.35 

222.03 

5.286 


The computations are as follows: 


sx 2 = 3 , 587,590 SXY = 67,664.27 -SF 2 = 1,316.1479 

= 3,435,432 &WQ = 63,500.58 ^ = 1,173.7457 

Si/ = 142.4025 ~ 


42 


Sx 1 = 152,158 


Sxy = 4,163.69 




4,163.69 


(2,015)(35.00) + • • • +(2,106)(39.35) _g g 500.58 
(1,378)(20.12) + • • • +(2,121)(44.12) 50Q 5g | 

= 4,163.69 + 116.56 - 3,598.05 = 682.20. 


E xx = 28,665.10, b-f=-= .023799. 

& xx 

SSE* = Sy 2 - SSB - SST - bE xy 

= 142.4022 - 6.3134 - 112.8562 - 16.2357 = 6.9969. 
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The results are presented in the following analysis-of-covariance table: 


Source of 
variation 

Degrees of 
freedom 

Original sums 

Adjusted results 

Sy 2 Sxy Sx 2 

SS d.f. MS 

Total 

Blocks 

Treatments 

41 

5 

6 

142.4022 4,163.69 152,158.00 
6.3134 -116.56 7,472.57 

112.8562 3,598.05 116,020.33 


Error 

(Error)' 

30 

36 

23.2326 682.20 28,665.10 

136.0888 4,280.25 144,685.43 

6.9969 29 .2413 
9.4655 35 


Adjusted treatments = (error)' — error 2.4686 6 .4114 


_ .4114 
.2413 


1.70, 


a > . 10 . 


Since F is not significant, we conclude that the treatments did not differ 
in their mean yields after adjusting for stand. This experiment falls in 
the uncertain class, because the stand is a result of the experiment and 
hence may be a treatment characteristic. For example, a given treat¬ 
ment may produce a good germination or a good start on the part of the 
plant so that its main contribution to yield may be in producing a good 
stand. If this is so, an adjustment for stand will cancel out the treatment 
effects. A test of T xx — 116,020.33 against E xx — 28,665.10 gives 
F x = 20.24, a highly significant value. This indicates that there were 
actually differences in stand from one treatment to another. Further¬ 
more we note that a test of the treatment effects on yield not adjusted for 
stand gives 


112.8562/6 

23.2326/30 


24.29, 


also a highly significant value. Hence we conclude that there were 
definite treatment effects on yield but that these effects probably were 
arrived at. indirectly through different stands, which then resulted in 
different yields. One further complication might be mentioned. Yields 
for low stand are often higher per plant than for high stand because of 
less competition for the available plant food and moisture. This appar¬ 
ently did not happen in our sugar beet example but in general should be 
consideied. Also, it should be noted again that we considered only the 
linear effect of stand on yield; often the effect may be curvilinear. A 
final point is that although the treatments did affect the stand, the stand 
could be measured without error. 
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In this case, there is little need to set up the average variance of adjusted 
treatment differences, but we present it to complete the example: 

-57TC .4826 (^ , 19,337^ 1QK 

^ = — V 1 + 28^665/ = ' 135 ' 

The exact variance for (t? - t%), for example, is 

(66) 2 


.2413 + 


28,665 


= .117. 


The unadjusted and adjusted treatment means are: 


Means 

Treatment 

All 

1 

2 

3 

4 5 

6 

7 

Unadjusted 

Adjusted 

3.353 

4.693 

5.630 

5.399 

3.563 

4.931 

6.487 7.080 
5.828 5.526 

3.538 

4.878 

7.353 

5.746 

5.286 

5.286 


The variance of an unadjusted mean difference would have been 



Hence I = .258/.135 = 1.91. 

21.3. The Use of Simple Covariance for Other Experimental Designs. 

No attempt will be made in this book to outline in detail the computa¬ 
tional procedures for the use of simple covariance with other experimental 
designs. The computing procedure for other complete-blocks designs is 
exactly the same as for randomized blocks, namely: 

(i) Set up the usual analysis of variance given in Chap. 18, but include 
the sum of squares for x and the sum of cross products as well as the sum 
of squares for y for each line in the analysis. 

(ii) Compute the error line by subtraction for SSE, E xv , and E xx . 

(iii) Compute SSE* = SSE — bE xy = SSE — El y /E xx . MSE* = s* 2 . 

(iv) Compute the (error + treatment) = (error)' line by adding the 
treatment line to the error line. 

(v) (SSE*)' = SSE' - VE f xy . 

(vi) Compute the adjusted sum of squares for treatments, 

SST* = (SSE*)' - SSE*. 

MST* = SST*/ (p - 1). 


(vii) F = MST*/s* 2 . 
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(viii) t* + Y = Ti - b(Xi. - X). 


(ix) $ 2 (d*) = 


2s* 2 


1 + 


T a 


(V - 1 )E XX \ 


If the experimenter wishes to test a single degree of freedom of the 
(P ~ 1) treatment degrees of freedom, he must go through the same pro¬ 
cedure as outlined above with this one degree of freedom replacing the 
treatment line in the computations. If several of these single components 
are desired, an approximate method of computation has been suggested 
by Cochran and Cox. 1 Compute 

S(y — bx ) 2 = Sy 2 — 2 bSxy + b 2 Sx 2 

for each component of the treatment sum of squares. This computation 
would give the correct value of SSE but overestimates each of the com¬ 
ponents of SST. However, the bias is generally small, and if the experi¬ 
ment is a complicated factorial, much labor is saved in testing the various 
main effects and interactions. 

See reference 6 for the use of covariance with lattice designs. 

21.4. The Use of Multiple Covariance. 

m 

Yii = n + rf + /3/ + T SkXkii + dj. 

k = l 

Snedecor 5 presents an example of a randomized blocks experiment on 
wheat yields in Great Britain with two fixed variates (height of shoots at 
ear emergence and number of plants at tillering). The treatments were 
6 different places, and the blocks were 3 different years. The computing 
procedure is as follows for m fixed variates: 

(i) Derive a table of sums of squares and cross-products for total, 
blocks, and treatments (and any other component in the analysis, such 
as columns in a Latin square). 

(ii) Compute the error line for each sum of squares and cross products 
by subtracting blocks, treatments, and any other component from the 
total. Use these error values to compute the {b k } and Ef, the squared 
multiple correlation coefficient for the error line. 

(iii) Compute an (error + treatment) line and R 2 +t . 

(iv) SSE* = SSE - E 2 SSE = SSE(1 - R 2 e ). 

(SSE*)' = SSE'(1 - R 2 +t ), 
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( v ) $* 2 = SSE* -*■ (error d.f. - number of fixed variates). 

MST* = [(SSE*)' - SSE *]/(p - 1). 

F - MST*/s* 2 . 

m 

(vi) tf + Y = T» - y b k {X u - - X k ), where b h is estimated in (ii). 

fc - 1 

(vii) If estimates of the variances of adjusted treatment differences are 
wanted, the b’s should be derived by a matrix-inversion method in order 
to obtain the variances and covariances of the b’s. 


s 2 («? - <?) = •** 


k 77 



where Ckk' is the element in the inverse matrix of the error line. The 
average variance of adjusted treatment differences is 


2s* 2 

r 



where TW is the sum of cross products for (a^av) in the treatment line 

of (i)> , 

S(X ki -X k )(X Vi ~ tv) 

Tkk> = -- —... ... .- —.■ 


EXERCISES 

21.1. Derive these relationships for E xy and E xx : 

E xy = S(Xij - x.j - Xi.)(ya - Bj — ?•), 

E xX “ S(Xij X-j x %.) . 

21.2. (a) Derive Finney’s result for the average variance of the differ¬ 
ence between two adjusted treatment means, presented in Sec. 21.2. 
(b) Also, derive the result for multiple covariance given in Sec. 21.4, 

21.3. (a) Make a complete least-squares solution of a simple covariance 
analysis for a Latin-square design. 

(6) Illustrate the results in (a) by analyzing the following 5X5 Latin- 
square experiment on the yield in bags per acre of No. 1 Irish potatoes 
(F), adjusted for the percentage of No. l’s (X). The treatments were 
different amounts (pounds) of P a 0 5 per acre: a - 0, b = 40, c = 80, 
d = 120, e = 160. 
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Rows 

Columns 

Total 

1 

2 

3 

4 

5 


t 

Y 

X 

t 

Y 

X 

t 

Y 

X 

t 

Y 

X 

i 

Y 

X 

Y 

X 

1 

a 

134.0 

91 

b 

149.1 

88 

c 

141.3 

87 

d 

161.3 

91.1 

e 

149.2 

91 

734.9 

448 

2 

b 

148.5 

90 

d 

148.5 

91 

e 

199.3 

94 

a 

148.5 

90 

c 

152.7 

93 

797.5 

458 

3 

c 

145.2 

93 

e 

149.5 

95 

a 

119.9 

90 

b 

149.2 

94 

d 

145.8 

90 

709.6 

462 

4 

d 

171.1 

91 

c 

169.0 

94 

b 

144.9 

89 

e 

170.8 

95 

a 

130.4 

88 

786.2 

457 

5 

e 

175.8 

91 

a 

153.4 

94 

d 

168.9 

92 

c 

167.6 

96 

b 

141.5 

93 

807.2 

466 

Total 


774.6 

456 


769.5 

462 


774.3 

452 


797.4 

466 


710 

455 

3,835.4 

2,291 


SY 2 - 595,038.38, SXY = 351,944.8, SX 2 = 210,085. 


Show that b = 2.0616 and s* 2 = 127.18. 

(c) Analyze the linear component of the treatment mean square by 
the exact method and by the Cochran and Cox approximation presented 
in Sec. 21.3. Then show that the deviations from a linear trend are not 
significant. 

21.4. Consider a single degree of freedom to test for the adjusted effect 
of phosphate in the sugar beet example (Example 21.1): 

t ~ ^1 ~j“ ^2 ^3 -f“ ^4 — t§ "I - U 

(а) Use the exact procedure to show that the effect of phosphate is 
significant even after adjusting for stand. 

(б) Show that the approximate procedure presented in Sec. 21.3 cannot 
be used in this case. 

21.5. The vitamin B 2 example used in Chap. 15 and Exercise 18.6 can 
also be analyzed by covariance. 

(a) Analyze the effects of the three soil moistures on the amount of 
vitamin B 2 after adjusting for the effect of Xi. 

( b ) Analyze the effect of these three soil moistures after adjusting for 
Xi and X 3 . 

(c) Can the fixed variates be considered independent of the soil 
moistures ? 

21.6. Johnson and Tsao 7 analyzed the influence of sex, scholastic stand¬ 
ing, individual order, and grade on education development as measured 
by the Iowa Tests of Education Development. There were 3 of each of the 
last 3 variables and the 2 sexes, giving a total of 54 treatment combina¬ 
tions. An initial score and the mental age of each student were deter¬ 
mined before the development tests were administered. Only one student 
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was tested for each treatment combination. The final scores (F), 
initial scores (Xi), and mental age (X 2 ) were as shown in the accompany¬ 
ing table: 


Data of Exercise 21.6 


Sex 

Scholastic 

standing 

Individual 

order 

Grade 10 

Grade 11 

Grade 12 

Y 

Xr 

x 2 

Y 

X! 

x 2 

F 

Xi 

x 2 




1 

30 

28 

45 

26 

22 

62 

29 

25 

60 



Good 

2 

25 

22 

58 

26 

21 

57 

29 

24 

88 




3 

22 

19 

46 

24 

21 

65 

22 

19 

64 




1 

26 

22 

56 

24 

25 

54 

23 

21 

64 

Male 


Average 

2 

17 

14 

19 

23 

18 

55 

20 

17 

47 




3 

14 

14 

29 

15 

13 

24 

19 

17 

75 




1 

18 

18 

34 

18 

17 

40 

17 

16 

29 



Poor 

2 

17 

14 

17 

16 

13 

24 

15 

15 

38 




3 

12 

9 

19 

13 

12 

23 

14 

12 

28 




1 

21 

16 

44 

26 

22 

60 

33 

29 

94 



Good 

2 

21 

21 

44 

25 

22 

57,. 

29 

29 

89 




3 

19 

17 

6 

23 

19 

52 

25 

22 

78 




1 

20 

18 

38 

22 

19 

54 

23 

21 

50 

Female 


Average 

2 

18 

16 

27 

21 

19 

54 

18 

19 

57 




3 

14 

14 

18 

17 

16 

52 

17 

17 

43 




1 

14 

9 

18 

19 

17 

40 

15 

13 

36 



Poor 

2 

12 

7 

18 

15 

12 

28 

15 

14 

35 




3 

9 

7 

5 

13 

12 

48 

10 

9 

14 

• $ - 


SY = 1068, SXx = 944, £X 2 - 2,379, 

SY 2 = 22,730, SXl = 17,926, SX\ = 127,639, 

SXiF = 20,116, SX 2 Y = 52,005, SX x X 2 = 46,227. 


(а) Set up an analysis of simple covariance on F, using X 2 as the fixed 
variate with the main effects of sex, scholastic standing, order, and grade 
pulled out plus all two-factor interactions. Pool the three-factor and 
four-factor interactions as error (see Sec. 20.2). 

(б) Repeat (a), but with both dependent variates. 

21.7. Covariance was used to reduce the experimental error in a ran¬ 
domized blocks experiment on scuppernong grapes. 8 Four blocks and 5 
magnesium treatments were used. Four plants were used per plot. The 
yields were in terms of pounds per plant. Two fixed variates were used 
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to estimate yielding ability prior to treatment: Xi == a score of 1 to 5 
given by the investigators as to the vigor and size of each of the eight 
arms on each plant; X 2 = diameter of each arm at a point 18 inches from 
the crown. For both X’s, the individual arm measurements were 
cumulated for each plant. The sum of squares and cross products were 
as follows: 



d.f. 

Sy> 

Sxiy 

Sx 2 y 

Sx 1 X 2 

Sx{ 


Total 

19 

1,001.90 

306.25 

629.55 

308.64 

154.52 

963.44 

Blocks 

3 

223.50 

39.81 

-85.95 

9.07 

20.64 

77.19 

Treatments 

4 

144.60 

64.54 

246.90 

113.11 

32.89 

436.32 


(a) Complete the analysis of covariance using only Xi. What was the 


efficiency of the covariance analysis? 
(6) Repeat (a) using X 2 also. 


The treatment 

means 

per plant 

were as 

follows: 



I 

II 

III 

IY 

V 

Total 

Y 

13.79 

20.28 

17.77 

19.75 

14.33 

17.18 

Xi 

19.56 

22.88 

22.56 

22.56 

20.81 

21.67 

x 2 

64.44 

75.06 

72.56 

73.69 

64.31 

70.01 


Derive the adjusted treatment means in (a) and (b) and the average 
standard error of the difference between two adjusted means. 

21.8. Covariance can also be used to make an analysis of variance when 
one or more of the plots are missing: this is necessary only when a double 
or higher restriction is made on the design, as with a randomized com¬ 
plete-blocks or a Latin-square design. The procedure is to set Y = 0 
and X = —1 for the missing plot and X = 0 elsewhere. If there are 
several missing plots, multiple covariance must be used, with Xi = — 1 
for the first one, X 2 = — 1 for the second, etc. Use the method of covari¬ 
ance to make the analysis of Exercise 18.12. For the application of these 
methods to other designs, see reference 9. 

21.9. For the application of covariance techniques to disproportionate 
frequency problems, see reference 10. 

21.10. The analysis of covariance is often used to determine whether 
or not the same regression coefficient (/3) applies to all treatments. 
Snedecor 6 presents an example of this type of analysis. Use the vitamin 
B 2 data [Exercise 21.5(a)] to compute a separate b (for Xi) for each of the 
three soil moistures. Then set up the following analysis, where bi is the 
regression coefficient of F on Xifor each of the three treatments {i = 1,2,3). 
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Source of variation 

Degrees 
of freedom 

SS 

MS 

Deviations from average regression ( b) \ 
Deviations from individual regressions 

23 

21 

SSE* 

(SSE)” 

s” 2 

Differences among individual regressions 

2 

SSE* - (SSE)” 

SSE* - (SSE)” 

2 ' 


F = [SSE* - (SSE) // ]/2s //2 , 

(SSE)" = By 2 - SST - [biSxiy + bzSxiy + biSxiy], 


where S applies to summation over treatment i. 

21.11. In Example 21.1, neglect the block effects and apply the 
methods of Exercise 21.10 to determine a separate regression line for 
each treatment. Graph these data, and draw in each regression line 
and the over-all regression line. 
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CHAPTER 22 


VARIANCE COMPONENTS: ALL RANDOM COMPONENTS, 
EXCEPT THE MEAN 

22.1. Introduction. The regression models used in the previous 
chapters of Part II assumed that all variates were fixed except for a single 
random error term. Most experiments are designed so that several com¬ 
ponents are random instead of all except one being fixed. That is, the 
blocks in a randomized-blocks design may be assumed to be randomly 
selected from a large population of blocks, so that the block-to-block part 
of the analysis of variance is also a random component. And in sampling 
experiments and surveys and in genetics experiments, the model almost 
always postulates several random components: for example, (i) a sample 
of soils to determine the basic sources of variability, such as plot-to-plot 
differences, sample-to-sample differences in the same plot, and laboratory 
determination errors; (ii) a sample survey covering an entire region with a 
few counties selected from the large number of counties in the region, then 
a few areas selected from each sample county, and perhaps only one or 
two families from each selected area; (iii) a corn-breeding experiment with 
parents taken from a population produced by random mating, the progeny 
randomly assigned to plots, and a random selection of plants from each 
plot. 

The regression model using several sources of random variation can take 
on one of two forms: (a) every variate, except the general mean, is a ran¬ 
dom variate, or ( b ) there is a mixture of random and fixed variates. 
Eisenhart 1 has presented the basic difference between (a) and the model 
used in the previous chapters of Part II and has indicated the importance 
of ( b ) without delving into the many theoretical difficulties involved in its 
use. Crump 2 presents the basic theory for (a) and includes an extensive 
bibliography of the use of components of variance. Crump 3 presents a 
more recent bibliography of articles on variance components and dis¬ 
cusses problems which merit further investigation. R. A. Fisher 4 indi¬ 
cated the additive properties of variances in the first edition of his 
Statistical Methods for Research Workers. Yates and Zacopanay 5 indi¬ 
cated the application of these methods to field sampling; among others, 
Cochran 6 extended them to enumerative surveys. Much of the recent 
development in the field of quantitative genetics has been built on a 
variance-components model, as discussed in references 7 and 8. 
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One of the authors of this book has presented the basic theory required 
for the mixed model (6) in an article on the analysis of price data. 9 The 
reader is cautioned that this last article has many drawbacks from an 
economic point of view, but the difficulties encountered in the use of a 
mixed model are adequately outlined therein. This article drew much 
of its theory from articles by Daniels 10 and Satterthwaite. 11 The mixed 
model is also required in the analysis of a series of experiments, such as 
those conducted over several years and at several places. Cochran and 
Cox 12 indicate some of the difficulties for this type of experimentation. 

We shall discuss general uses of model (a) in this chapter, mixed models 
in Chap. 23, and the use of variance components for balanced incomplete- 
blocks designs and lattice designs in Chap. 24 and shall give a general 
summary in Chap. 25. 

22.2. A Randomized-blocks Model. Let us assume that we have p 
treatments, each allocated to each of r blocks, and q samples taken from 
each of the pr plots, assuming that in each case the particular treatment, 
block, and sample are random samples from infinite (at least, very 
large) populations, f Hence there are prq samples. The model for the 
yield of the kth, sample from the fth treatment on the jth block is 

Yijk = m + n + Sj + + t'ijk, 

where n, (3 J} (t/ 3)#, and e ijk are all assumed to be NID with means 0 and 
variances, of, of, of 6 , and of, respectively. Again it should be made clear 
that for purposes of estimation, the assumption of normality is not 
needed. This assumption is required, however, if the usual tests of 
significance and confidence limits are used. These variances are called 
variance components. Hence 

E(Yij k ) = ix , a 2 (Yijk) = of + of + of 6 + of. 

An experiment of this kind is set up to estimate the mean, ix, and the 
variance of this estimate and/or to obtain estimates of the variance com¬ 
ponents themselves. The statistical geneticists, for example, are inter¬ 
ested mainly in the variance components. If the experiment was set up 
to obtain an estimate of it is understood that the estimate, Y, will 
deviate from (x because only a sample of treatments, blocks, and plots (or 
samples from the subclasses) is used in the experiment. In other words, 
from one experiment to another, one could expect different values of the 

t Or we might consider r blocks with pq plots per block, each treatment being 
assigned at random to q plots per block. Although the two types of experiments are 
analyzed the same, the interpretations are different. The procedure given in the 
body of the text produces an estimate of sample-to-sample fluctuation in the same 
plot, while the one mentioned in this footnote produces an estimate of the plot-to-plot 
variation within a block. 
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{ T< }, {ft}, {rfti}, and {e#*} to appear. The results of the experiment can 
be used to estimate the variability of these random variables and hence 
to furnish an estimate of the variance of Y. This estimate of <r 2 (f) can 
be used to construct confidence limits for ju and to indicate how changes in 
future experimental plans might affect the precision of the estimate. The 
reader will note that this type of experimentation is fundamentally differ¬ 
ent from that described in the previous four chapters, even though the 
same analysis is used. In those chapters we were interested in estimating 
certain treatment effects under very limited experimental conditions and 
in making tests regarding the differences between these effects. Here we 
want to estimate a single mean with application to a wider area than that 
of the experimental plots used in the experiment. 

If the experimenter is interested only in the variance components, the 
problem is certainly much, different from that of estimating a mean. In 
this case he wants confidence limits for the variance components, or func¬ 
tions of the variance components. We have stated previously that non¬ 
normality was probably not too serious for estimating the confidence 
limits for means, and if it were serious, some simple transformation wpuld 
generally remedy the situation. However, the situation is not so simple 
for variance components. As we shall see later, the only estimates of the 
variances of these estimated variance components will be in terms of the 
components themselves, a satisfactory condition if normality holds. 
However, slight deviations from normality may require a knowledge of 
other data than that furnished by the analysis of variance (such, as higher 
moments) in order to estimate these variances. As will be shown fatei, 
the problem of setting up confidence limits for variance components is far 
from solved. 

The estimates of the variance components will be designated as a 2 — s 2 
with the same subscripts. Obviously = Y. Since we have more than 
one component of variation in this model, the method of least squares 
cannot be used in the estimation process. Instead we must utilize a more 
general estimation procedure, such as the method of maximum likelihood. 
In order to present this material in its practical connection with the 
analysis-of-variance problems discussed previously, we shall assume that 
the data have been summarized in an analysis-of-variance table and, then 
proceed to derive the expected values of the mean squares, using our 
variance-components model. Then the estimates of these variance com¬ 
ponents are found by equating the mean squares and their expectations, 
where each <r 2 is replaced by its corresponding s 2 in the expectations. 
These analysis-of-variance estimates are the same as the maximum- 
likelihood estimates for an orthogonal model such as the randomized 
complete-blocks model, if the errors are normally distributed. For non- 
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orthogonal models, such as a disproportionate frequency factorial, the 
maximum-likelihood equations are very difficult to solve (requiring 
iterative procedures in many cases), and the two sets of estimates would 
not be the same. We can derive unbiased estimates of the variance com¬ 
ponents from the analysis of variance, but there is no guarantee that these 
estimates are the most efficient which could be devised; in fact it is often 
doubtful that they are even moderately efficient for multiple-classification 
data with unequal numbers in the subclasses. 

The analysis of variance pertaining to the randomized complete-blocks 
model is: 


Source 
of variation 

Degrees of 
freedom 

Sum of 
squares 

Mean 
square f 

E(V)f 

Blocks 

Treatments 

T XB 

Sampling error 

r - 1 

V - 1 

(r - 1 )(p - 1) 
(q - 1)rp 

SSB 

SST 

SS (TB) 
SSE 

MSB = Vi 
MST = V 2 
MS(TB) = F 3 
MSE = F 4 

A + qA b + vq?l s A 

A + qai + rqa\ ^ 

A + ® A 

A - A 


t We shall use V to stand for a mean square. 


SSB = 


^ (Bj G/r ) 2 ^ [(Bj — pqn) — (G — rpqn)/r]‘ 


pq 


pq 


£ ( B s ~ PWY 


_ 3=1 


pq 


(G — rpqn ) 2 
rpq ’ 


Bj ~ pqix = q ^ Ti + Pffft- + q ^ (r0) <y + ^ 

i i i k 

0 - rpqp ^rq^Ti + pq^fr + q J 


i j k 


Hence 


E{ SSB) 


pq 2 a 2 + p 2 q 2 <?b + + pq<J 2 


pq 


r 2 pq 2 af + rp 2 q 2 a% + rpq\r 2 h + rpqo 2 


= (r - 1 )pqol + (r - 1)^ + (r - l)cr e 2 , 
E( Vl ) - = pqal + qa% + s 
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Since SST will be the same as SSB except for p and r and ,<rj and <r; 
interchanged, it is evident that 

^ (Ti - rquY 


(G - rpqix)' 
rpq 


rq rpq 

Ti — rqji = rqn + q ^ ^ ( r 0)o - + 

3 3 3 k 

E{ SST) = (p - 1 ){rqa\ + qa% + <r|), 

E(Vi) = rqaj + qa% + ^ = a|. 


Also, we know that 


SS(TB) = 




z 3 


/mm _ ou _ y. - ^ - g s? + <Lum]' 

LjL 4 ^ r p rp J 


VT.r 


22 \-t TR )'i - ^ 2 {T< ~ rqti)i X {Bi ~ vq>i) 


rq 


pq 

+ 


(G — rpqix)* 


rpq 


where ( TB)ij — qix — qln + ft- + ^ € ^* 

k 

Hence 

E[S8(TB)\ = rp[q{aj + <r| + 4) + - plml + <1*1 + ?4 + ®?1 

- r[qa 2 t + pqol + g4 + <r|] + [rq<r 2 t + OT°f + ?4 + 
= q(rp — p — r + 1)4 + {rp — V ~ r + IV? 

= (r - l)(p - 1 )(g4 + 4>, 

E(V S ) = «4 + < 7 > = 4 

Finally we know that the total sum of squares is 

Sy* = S(Y - G/rpq) % = S[F - rpq/A 2 - [(7 - rpquY / rpq. 

The expected value of this total is 

rq(p - l)<r? + pq{r - l)<r? + q(rp ~ 1)4 + (m ~ 1)4 
It is easy to show that 

E{V t ) = <rf = 4 
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The residual also can be expressed asf 


&ijk Yijk (TB)ijfg €ijjc ^ ^ijkfq, 

Elfihu) = (« - IVJ/g, £(SSE) = rp(q - l)cr*. 


IX (r/3) * £2*** 


F = M + -i__ + i_ + j_j- + i ^jL 

p r rp rpq 

E(Y) = ju and <r 2 (?) = - + ^ 

p r rp rpq 

If a particular treatment mean is of importance, 


^ Pi ^ ^ 


Ti = ^- = M + Ti + J - + AJl -, 

rq r r rq 

E(Ti) = m 

= d -i- ?$> + d = ^1 + (p - iVI 

% r r rq rpq 

This assumes n is now a fixed variate (see Chap. 23). 

The relative importance of various components can be assessed in 
cr 2 (7) or (r 2 (Ti) and compared with the costs of obtaining the sample to 
enable the experimenter better to plan his future experiments to estimate 
ix or n. For example, if it costs C e for each sample in a plot and C p for 
each plot, then the total cost per treatment is 

Ci = r'q'C, + r'C p , 

where r' plots are sampled and p f samples per plot. Suppose now that 
the total cost per treatment is fixed at C. We are then led to minimize 
<r 2 (Ti) subject to the restriction Ci = C, in other words, to select r' and q f 
so as to minimize 

a 2 (fi) + X(Cf - C). 

- + WC.) =0, -A (<r 6 2 + o4 + + \{q'C e + C P ) = 0, 

Ci = c. 

The solutions are 


cm + *iy 


q r C e + a 


t See the footnote on page 314 for a discussion of this residual, e may represent 
either sampling error or plot-to-plot error, depending on the randomization plan. 
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Of course q' and r’ must be integers, and so the exact minimum in general 
will not be reached. 

In order to make the estimates of the variance components (a e , <r tb , a t , 
and <r|) unbiased, we merely replace the components by their estimates 
(si s}, sf, and s 2 b , respectively) in the E(V) column and equate respective 
rows in the 7 and E( 7) columns. Hence a 2 — s 2 = Vi, and 


4 




s 


2 _ 
t ~~ 


7s - 7 4 

q 

72 - 73 

--- , 

rq 


<?b 


— S b 


Vi - 7a 

vq 


It can be shown that all of the quantities [Bj G/r], [Ti G/p], 
[Y ijk ~ (TBh/q], and [(TB) iS - Ti/r - Bj/p + G/rp] are orthogonal to 
one another. Hence the various mean squares are noncorrelated. Since 
any Vi is a sum of squared linear functions of NID variates, it is inde¬ 
pendently distributed as x 2 °f with fi degrees of freedom, where i stands 
for some particular line in the analysis-of-variance table with /* degrees of 
freedom in that line and af = E(Vi). Also, 


0-2 (7 i) 


2a} 2 7? , 

fi /<+~2 T 


In order to use the method of maximum likelihood to estimate the 
variance components, some preliminary transformations are needed. 
When the only random effect is e(r, j3, and t(3 are fixed effects), the Yijk are 


However, when some of the other effects are also random, the 7^ are 
correlated. For example when everything is random except p, the 
covariance between 7^ and Yaw is 

of + *1 + <4- 

Hence the simultaneous distribution of the 7 ? s is a multivariate normal 
distribution with nonzero covariance elements. This distribution pan be 
simplified by splitting (Y ijk - 7) into the sum of orthogonal parts as 
follows: 

Y m - Y = (qY m - TBij)/q + (TBa - T</r - B,/p + G/rp)/q 

+ (pTi — G)/rpq + (rBj — G)/rpq, 

t Some attention needs to be paid to the use of the fourth moment of the observa¬ 
tions in estimating <r t h In some sampling problems, it may be possible to obtain 
several independent samples, determining Vi from each, and then estimating a 2 (V%) 
from the sample-to-sample fluctuations. 
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where each quantity in parentheses is orthogonal to every other such 
quantity and has an expectation of zero.f 
Therefore 

ifk (Ym ~ = & ~ + f ( TB » ~ Tt/r - B,/p + G/rpY/q 

+ S(Ti- G/pY/rq + S(B s - G/rY/pq 
= (g - 1 )rpVi + (r - l)(p - 1)V 3 + (p - l)7 a 

+ (r- l)Fi. 

If the components are NID, these four sums of squares are independently 
distributed as x, 2 of with /< degrees of freedom, where the successive <rf are 

a \ — a h <r| = (y 2 + qa%), vf = (a\ + qa\ + rqufj, 

°i = W + 2<4 + pjo-f). 

The respective /» are 

h = rp(g - 1), f z = ( r - l)(p - 1), / 2 = (p - 1), /, = ( r - 1). 

Since the above sums of squares are independently distributed as xM, 
their joint frequency distribution is that of four x 2 distributions.f By 
transforming from the distributions of x 2 to those of the F's, it can be 
shown that the logarithmn of the likelihood function (L) is 


L 


1 

2 


constant + log of + 



Hence the ML estimates are 


of = Vi, i = 1, 2, 3, 4. 

And by subtraction we find that 




Since the mean squares are orthogonal, the variance of an estimate of a 
variance component is simply a multiple of the sums of the variances of 
the mean squares used in estimating this variance component. That is, 
a variance component can be written in the form 



t Note that we are studying only deviations from the sample mean. This is 
equivalent to studying the likelihood of the four parts of the analysis of variance 
neglecting any information which T furnishes about the variance components 
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where ai = ± 1. Hence 


<T 2 (s 2 ) 


J * W 


For example, 


V vt 

L*i fi + 


= o'2 


si = 


Fj - F 3 

pq 


(J (p<z ) 2 


F! 


+ 


fi 


(p<Z) 2 Lr + 1 1 (r ~ l)(p - 1) + 2_| ~ 


= o'2 


The problem of estimating confidence limits for of is still not completely 
solved. Some of the available procedures are indicated by Bross. 13 
These are: 

(i) When r and p are large, the distribution of s 2 approaches normality. 
In this case the (1 — a) confidence limits are 


fi? ~ T a 8 f b <al<sl+ T a s ' b , 

where (P(\T\ > T a ) = a and T is a normal deviate. 

(ii) Satterthwaite 11 suggested that s 2 was approximately distributed 
as xV 2 // 7 , with f degrees of freedom, where 


/' = (2a,F,) 2 /2(a?F|// i ). 

Presumably it is best to use the nearest integer for /'. In example 10.2 
it was shown that the (1 — a) confidence limits for a 2 are 


where (P(x 2 > x!) = cP(x 2 < x!) = a/2. For of, 


fl = (F x - F 3 ) 2 /[(Ff//i) + (F|// 3 )]. 

Satterthwaite warned that when some of the a 1 s are negative, as in our 
case, caution must be exercised in the use of this x 2 approximation. 

(iii) If s 2 = (Vi - Vf)/c and a 2 = E(s 2 ) > 0, 


Jp Vi 77T _ rp -t | CO - 

l+”2- 

V 3 <Tj [_ d i 


or 


[1 + (ca 2 /af)] 

where a 2 is the variance component under consideration. 

Let F 2 and F 1 be tabular values of F } with fi and fj degrees of freedom, 
such that 

CP (F > F 2 ) = <P(F < Fi) = a/2. 
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Hence 


5 = <P F 0 >F 2 l 1 


= CP Fo < FA 1 + 


If we solve inside the brackets for <r 2 , we find that 


f = <P ,.<(£-1)2* 

2 \F 2 ) c 


CP O - 2 > 


Hence 


i )1 




= 1 — a. 


If we replace tf/c by its estimate, Vj/c = s 2 /(F 0 — 1), the confidence 
limits for a - 2 are 

ow - 1 ^2 < g-2 < aw - 1 s2 . 


Fn ~ 1 


Fo ~ 1 


(iv) Using the fiducial probability concepts of It. A. Fisher , 14 Bross 13 
derived the following (1 — a) fiducial limits for a 2 : 


(F 0 /F 2 ) - 1 

F f 2 (F 0 /F 2 ) - 1 


s 2 < cr 2 < 


~ (F 0 /F 1 ) - 1 " 
MFo/FO - 1 _ 


where FJ = Fi for (/* and ) degrees of freedom. 

In (iii) and (iv),Fi = l/F 2 (fj,fi), where F 2 = F 2 (fijj) (see Exercise 7.29). 
In all cases, if the lower limit is negative, it is replaced by zero. This 
will occur when F 0 is nonsignificant at the a/2 significance level (Fo < F 2 ). 

The problem of setting confidence limits for p. is also complicated, 
because one does not know how many degrees of freedom to use for t. 
The Satterthwaite approximation 11 presumably can also be used here. 
For the given experiment, 

<F(f) = (<r 2 + a\ - a 2 )/rpq. 

Hence s 2 (Y) = (IT + V 2 - Y/)/rpq , with approximately f degrees of 
freedom, where 

« _ (IT + IT - U 3 ) 2 

~ _ZL 4. _l_ v l 

r — 1 p — 1 (r — l)(p — 1 ) 

Of course, if r and p are large, it is safe to use t.o& = 2 . 

The problem for a single treatment mean is slightly different, because 

cr 2 (Ti) = [erf + (p - 1>|]A 'pq. 
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Hence s 2 (Ti) = [Vi + (P ~ 1 )V*]/rpq with/” degrees of freedom, where 
(Fijb (p - 1 )Z*P (r - l)[Fj + (p - 1 )FJ- 
+ 


7/ _ 


VI 


l 


[(p - i)f 3 1 2 
(r - l)(p - 1 ) 


V\ + (p - 1)7,* 


Example 22.1. Let us consider the example presented by Crump 2 of a 
series of genetic experiments by J. W. Gowen on the number of eggs laid 
by each of 12 females, from 25 races of Drosophila melanogaster on the 
fourth day of laying, the whole experiment being carried out 4 times 
( r _ 4 ^ p = 25, q = 12). The analysis of variance was as follows: 


Source of variation 

Degrees of 
freedom 

Mean square 

E( MS) 

Experiments (blocks) 

Races (treatments) 

T XB 

Females in subclasses 

3 

24 

72 

1100 

Vi = 46,659 
V 2 = 3,243 
Fs = 459 

F 4 = 231 

of = o’. +12<4, + 3000 ; 

«r| = o 2 , + 12o ( 2 6 + 48 of 

4=4 + 124 

4 - 4 

1 


sf = 231, 

459 - 231 


4 = 


12 

3,243 - 459 


~ 48 

„ 46,659 

4 - - 


300 


= 19, 

58, 

= 154, 


J 2 


4 


si 2 = 


2(231) 
1,102 
2 

144 
2 


97, 


(231) 2 , (459 ) 2 
1,102 + 74 


'] = 40, 


459 


f (459 ) 2 
2,304 L 74 
2 [ (459 ) 2 

Sb - 9M00 L 74 


+ 


+ 


a»r]. 

(46,659) 2 j = ^ 


676. 


The estimated variances of a general mean and a race mean with r 
experiments, p' races, and q' females per subclass are 

154 


,,<* 231 J9 58 

s (7) 7jR + r’p’ P 1 

49 . 154 

r' r' 


S 2 ( f.) = ?31 

sUJ /o' 


r 

231 

r'q f 


173 


The most important component appears to be the variation from experi¬ 
ment to experiment (of). Hence, in order materially to cut down the 
variance of a race mean, it is necessary to increase the mimber of experi¬ 
ments (r), since increasing r decreases both parts of s 2 (Ti) 

Since of is so important, let us consider the different methods of setting 
90 per cent confidence limits for it. 
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Method 

5% lower limit 

5 % upper limit 

10 % upper limit 

Normal (i) 

0 

316 


x 2 (ii) 

59 

1,313 

791 

T (iii) 

55 

1,331 

801 

Fiducial (iv) 

58 

1,325 

797 


Some of the data, needed for the above results were! 


si — 98.4, 

fl= 3, 

F„ = 101.65, 

X 2 o5 = 7.81, 

X?95 = .352, 

X. 2 9 o ^ *584, 

F .06 = 2.74, 

F ' n 8.57’ 

o 

II 

pi 

l-L t - 3 
04 

F'm = 2.60, 

E f — 1 

* 95 8.53 ; 

V 1 

• 90 5.13' 


We computed the 10 per cent upper limit for the last three methods, 
assuming that the lower Emit would be zero. Actually, when all the 
probability is put on the upper tail, the lower limit will be slightly nega¬ 
tive for (hi) and (iv), but we have postulated that of > 0. When there 
are so few degrees of freedom for estimating a component such as <r| the 
confidence limits should be quite wide. The normal approximation is 
definitely unsatisfactory in this case. The other three estimates are 
remarkedly close together. Bross 13 gives some examples where this is 
not the case. He criticizes the x 2 method in one case of a nonsignificant 
Fo, for which methods (iii) and (iv) will give negative lower limits (assumed 
0 since <r| > 0), whereas the x 2 lower limit is greater than zero. It might 
be advisable, when F a is nonsignificant, to put all of the probability on 
th e upper tail. This procedure should be considered from a theoretical 
standpoint. 

For this particular sample, s 2 (T) = 41.20, and 

f = _ (49,443) 2 

(46,659) 2 (3,243) 2 (459) 2 ” 3 ' 4; 

3 + 24 + 72 

also, s 2 (Ti) = 48.06, and 

f , _ 3(57,675) 2 

J (46,659) 2 + (24) (459) 2 ~ 4 ' b ' 

Hence the 95 per cent confidence limits would be 

Y - (2.3) (6.4) < M < Y + (2.3) (6.4), 

T< - (2.1) (6.9) < Ti <fi+ (2.1) (6.9). 
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22.3. Repeated Subsampling with Equal Numbers in the Subclasses. 

A sampling technique ordinarily used in sample surveys and in many 
sampling experiments in the physical and biological sciences is that of 
repeated subsampling, sometimes called nested sampling (see references 15 
and 16). We shall assume that there are four tiers in the universe: A, B 
in A, C in B, D in C. Obviously there may be more or less, but we shall 
illustrate the methods with four. Assume that the number of possible 
samples from A is large enough so that the sample represents only a small 
segment of the tier; hence, regardless of the sampling rate within the 
selected A units, the sampling rate for all B, C, and D units will be small. 
Suppose that a A units are selected, b B units from each of the A units, c 
C units from each B unit, and d D units from each C unit, the total number 
of samples being n — abed. 


Yijkm = M + + fiij + 7 ijk + 


where i = 1, 2 , . . . , a; j = 1, 2 , . . . , b; h — 1, 2 , . . . , c; m 1, 
2, d. All of the effects, except p, are assumed NID(0,cr e 2 ), where 

<r 2 = o' 2 , crl , or 2 , or o-J, respectively. Hence 


Y = 


and <r 2 (?) -4 + 4 + 4,+ * 

abca a 


ab abc abed 


The analysis of variance is as follows: 


Source of 
variation 

Degrees of 
freedom 

Mean 

square 

E{ MS) 

A 

a — 1 

Vi 

<y\ = V dal + cdal + bcda\ 

B in A 

a(b - 1 ) 

v 2 

ai = a 2 d + da 2 c + cdo* 

C in B 

ab(c — 1 ) 

Vz 

<*\ = a \ + da 2 e 

D in C 

abc(d —l) 

Vi 

= ff d 

Total 

abed — 1 




The sum of squares for A is computed like the block and treatment sums 
of squares in Sec. 22.2. The remaining sums are simply within sums of 
squares. Hence, 

A? 

.__ G 2 ^ 

bed abed 


i 


SSA = 



SSB - ^ s - 

etc. ? 
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where A{ Yijkm, F^ m , etc. The expected values of 

3 k m km 

the mean squares are computed as in Sec. 22.2. All of the theory presented 
in that section pertaining to the estimates of the variance components 
and their variances and confidence limits carries over here, except that 
each variance component is estimated from its mean square and the one 
just below. For example 

S 2 = Vi~ V* 

bed 

However, the confidence limits for M are easier to compute, because 
s 2 (Y) = V\/n with (a — 1) degrees of freedom. 

Example 22.2. Suppose we had a sample of 40 townships with 4 areas 
per township and 2 farms from each area, giving a total of 320 farms, to 
estimate the average number of acres of corn per farm. The mean corn 


yield 

data 

was 40 bushels per acre, 
was: 

The analys 

is of variance for 


Division 

Degrees of 
freedom 

Mean 

square 

E( MS) 


Townships 

Areas in townships 

Farms in areas 

Total 

39 

120 

160 

319 

320 

240 

100 

o-j + 2a + 
+ 2 v 2 


The expectations of the mean squares are given on the right, where a) is 
the farm-to-farm variance component, a 2 the area component, and a? the 
township component. The estimates of these variance components are 

sj = 100, s’ = ^ = 70, a? = g g . ° ~ 240 = 10. 

The estimated variance of the mean corn yield for b f townships, d areas 
per township, and d f farms per area would be 

o2(y-) — ^ _i_ _i_ 100 

- b' + Vd + Vdd r 

For the given sample, s 2 (F) = 1.00. Hence the 95 per cent confidence 
limits for ju are 

40 — 2.0 < yu < 40 + 2.0 


38 < a < 42. 
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, 22 ' 4 ' “V^ 

?r, s ;{ ,h«“» .4~—«»- - ™'“' »“ te m ; v he ™t 

subclasses. Assume there are « A unite, each with »< «a«P es «. ), 

each A unit has fe Sunto, B unit havta,■"« “»£ ; SK 


Source of 
variation 


Degrees of 
freedom 


Coefficients of variance components 

in E(MB) 






A 

B in A 
C in B 
D in C 


a — 1 
bi - 


!• 

i i--i 


m*>m+ 


y n m 


\iii^ 


tt nhfi 


(l/ni) - 0 /n) f = and/ijb = VT" 

^ = L ^T- y bi -a u«<-l b< 

V i 3 * 

l 

The sums of squares in the analysis of variance are computed as follows: 

& 

n 


™-p 

ssb ‘ ixs-x^- 


BSC 


etc. 


ms-xx 

i j fc 


B\ 


i 3 


A systematic computing 

coefficients of the variance components “ thfi below 

snuares This will be demonstrated m Example 22.3 • 

t This section may be omitted without upsetting the continuity of the boo . 
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Example 22.3. Cochran 6 presents the analysis of a 1937 enumeration 
of commercial wheat fields in 6 districts of Great Britain. The sampling- 
plan was to select a number of farms from each district, 1 or 2 fieWs pe^ 

tion fro 4 th 1 hS l Pe f fieW ’ 6aCh Path haVing 6 bulked sam Pfes. In addi- 

m district m ’ 2 varieties were sam P ied ^ -h 

Ll!l 3 fi f M ? P thS Per Vanety per field) ' The number of farms and 
fields per farm for each district were: 


tv + • , v „ Totals 

District No. farms No. fields per farm f __ 

___ Fields Paths 

I 2 2(2) ~ T 

II 2 2, 1 3 fi 

IF* 1 3 3 12 

,, 9 2(2), 1(7) 11 22 

v 1 2 2 4 

_ 10 2(3), 1(7) 13 26 

T ° tal 2 5 36 W~ 

+ Two varieties were sampled in each field. 

The analysis of variance, excluding the 3 degrees of freedom for 
vaneties is presented below in terms of hundredweights (112 pounds) 
per acre (based on the means of the six bulked samples) f 


Source of variation - De § rees of 
freedom 

Districts (.4) 5 

Farms (B) in A 19 

Fields ( C ) in B 11 

Paths (D) in C 39 


Mean 

square 

10.28 

6.59 

3.03 

.825 


F(MS) 

2-34cr 2 + 4.90cr 2 + 11.96cr 2 

2.00o- 2 + 2.58cr 2 

2.36cr? 


Set up a diagram like the following to help in computing the coefficients 
of the variance components in E( MS): 6 ^emeients 

eac^ef^nga £ **" ^ ° f f ° Ur Sets ° f 3 ^ «h, 

times as larged lose love a “ PleS “ ^ “ t; h6nCe ** mean ^ -» eight 
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No. 

units 

No. 

samples 

Distribution of the samples f 

Total 

1 

n 

78 

Districts 

6 

Ui 

8 

6 

12 

22 

4 

26 

Farms 

Fields 

25 

36 

nij 

riijk 

4(2) 

2(4) 

4(1) 

2 (2) 

2 (1) 

2 (1) 

12 (1) 

4(3) 

4(2) 2(7) 
2(4) 2(7) 

4(1) 

2 (2) 

4(3) 2(7) 
2(6) 2(7) 

'll 


f Numbers in parentheses are number of farms or fields having this many samples. 


This diagram reads as follows for, say, the last district: there are 26 sam¬ 
ples from this district, 4 from each of 3 farms and 2 from each of 7 farms; 
there are 13 fields each with 2 samples. 

The expectations of the various sums of squares are handled as follows: 


Sum of squares 

A 

A 

A 

<? 2 

7 M 20 

y y n i- 

77 348 

yyy 

77 7 _ 180 

n 

n 78 

= 18.21 

n 78 

= 4.46 

» 78 

=t 2.31 

y a] 

00 

II 

ss 

1XJ 

yy< =28.98 

yyy ^ =14<0 o 

i j k 

yy si, 

ZyZv n ‘i 

YX n<i ~ 78 

i j 

if--™ 

i 3 

yyy = 52.00 

LjL-tL-t n i f 

i j k 

yyy 

i j k 

78 

78 

78 

!! 


To illustrate the computing procedure, consider a few examples: 


(a) 

(&) 



_ (16)(2) (16)(1) + (4)(1) 

“ —g— -t- —— 


144 (16) (2) + (4) (7) 

12 ^ 22 


nh (4) (18) 

%j 4 


i i (16) (3) + (4) (7) _ 28.98, 
^ 4 26 

(4)(15) (16)(3) 

2 ^ 12 ° ' 


It should be noted that, for unequal subclass numbers, the mean squares 
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are no longer distributed as simply xVf//), but as sums like Xix? + X 2 xi + 
• • • , where \’s are functions of the variance components and the number 
of observations. Hence the confidence limits presented in Sec. 22.2 can¬ 
not be used here. The theory applicable to this problem is beyond this 
book. 


EXERCISES 

22.1. In Example 22.1, suppose that C e = 10, C p = 120, and C = 960. 
Solve for r’ and q' to minimize s 2 (T;). What is s\fi ) using these values 
of r' and q f ? Compare this result with that obtained in the actual experi¬ 
ment (r = 4, q — 12). 

22.2. An experiment was run on the average stem length (in inches) 
per plant of snapdragons. Seven soil types were used in each of 3 replica¬ 
tions, 18 plants being selected at random from each plot. The analysis 
of variance was as follows: 


Source of variation 

Degrees of 
freedom 

Sum of 
squares 

Replications 


39.04 

Soil types 


103.15 

Interaction 


! 19.74 

Sampling error 


507.30 


(a) Fill in the degrees of freedom, and compute the mean squares. 

(b) If these replications and soil types can be assumed to be randomly 
selected from large populations, what are the expectations of the mean 
squares? 

(c) Compute the estimates of the variance components in (6). 

(d) The sample mean height was 32.56 inches per plant. What is the 
variance of this mean and the 95 per cent confidence limits for the true 
mean, under the assumptions of (5) ? 

( e ) What is the estimated variance of a mean if p r soil types, / replica¬ 
tions, and q f plants per plot were used? 

22.3. An experiment was conducted to estimate the genetic and environ¬ 
mental variances in corn, 7 using 192 progenies produced from 48 males 
with 192 females (4 females per male). The field layout was in 12 blocks 
of 32 plots each, each block having 4 male parents, each crossed with 4 
different females, and each cross duplicated. The yields were based on 
the pounds of grain for 10 guarded plants per plot (plants having another 
plant on each side of the row). An estimate of the plant-to-plant varia¬ 
bility was obtained from 23 of the plots (230 plants). The analysis of 
variance was as follows: 
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Source of variation 

Degrees of 
freedom 

Mean 

square 

E{ MS)f 

Blocks 

11 

.153 


Duplications in blocks 

12 

.063 


Males in blocks 

36 

.167 

<r 2 A 10A 20<r^ A 80oA 

Females in males in blocks 

144 

.069 

<r 2 A 10A 2 O 0 -J 

Parents X duplications in blocks 

180 

.031 

A 

A 

0 

Plants in plots 

207 

.0153 

r 

.. - «' . 


f O- 2 is the plant-to-plant variance, <t 2 p the plot-to-plot variance (parent X duplicate 
interaction), the added variance due to females, and a 2 n the added variance due to 


males. 

(a) Derive the coefficients of the variance components in E{ MS). 

(b) Compute estimates of these variance components. 

(c) How would you test the hypothesis that oi = crj? 

(, d ) Derive approximate confidence limits for oj and cr 

22 . 4 . Assume we have p treatments and r blocks with abj samples for 
the ith treatment and the jth block, with all effects random except the 


mean. 

(a) Set up the analysis-of-variance table. 

(b) Derive formulas for the expectations of the mean squares in terms 
of the a,i and bj. 

(c) Suppose that abj = d- How does this affect the results in (b)? 
22 . 5 . Prove that 


22 


(Tm nil _ T ' - ^ ~ g - r m 

qv r p rp 


= 2 J t (TB^ - 


(Ti — rqn) 1 ^(Bj-pquy 


2 _ 1 


+ 


{G — rpee) 2 . 


rp 


22 . 6 . (a) Prove that {Bj - G/r) and {Ti - G/p) are not correlated, 
(b) Prove that each is not correlated with [{TB)ij - Ti/r - Bj/p + G/rp]. 

22 . 7 . Show that E{Vf) = {fi + 2)<r\/fj. 

22 . 8 . Show that the ML solution in Sec. 22.2 is correct. What happens 
if you consider S(Y — e) 2 instead of S{Y — F) 2 ? 

22 . 9 . (a) Given Vi — Vj = cx 2 <? 2 /f f , where Vi = Vj ~ xfaj/fh 

ca 2 _ ^ an d each chi-square has as degrees of freedom its denomi¬ 

nator. Show by equating the variances of the two sides of this equation 
that 


r = 


(c<r 2 ) 2 


Also, show that E{Vi — Vj) = E{cx 2 cr 2 /f). 
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(b) How does this result connect with Satterthwaite’s approximation 
for confidence limits on a 2 ? 

22.10. Marcuse 17 presents the following data on the moisture content 
of 2 sample cheeses from each of 3 different lots with 2 subsamples per 
sample (remember samples 1 and 2 are not the same for the different lots): 


Sample 

Lot 

I 

II 

III 

1 

39.02 

35.74 

37.02 


38.79 

35.41 

36.00 

2 

38.96 

35.58 

35.70 


39.01 

35.52 

36.04 


(a) Set up the analysis of variance for these data with the expected 
values of the mean squares, assuming everything random except the 
mean. 

(b) Estimate the values of the variance components. 

(c) Determine 90 per cent confidence limits for the “lots” component 
by the four methods outlined above. 

(d) Given that the costs are 10 per lot, 3 per cheese per lot, and 1 per 
subsample. Assume that 2 subsamples per cheese are to be used and 
that the total cost is approximately 100, and determine the number of 
lots and cheeses per lot to minimize the variance of the general mean. 
Could a lower variance be obtained by using other than 2 subsamples per 
cheese ? 

(e) What are the 95 per cent confidence limits for the mean? 

22.11. Show that the expected values of the mean squares in Sec. 22.3 
are correct. 

22 . 12 . Derive the ML estimates for nested sampling with equal num¬ 
bers taken from each class. 

22 . 13 . In a 1940 Iowa AAA corn-acreage study, 2 sections were selected 
from each of 1,617 townships. The analysis of variance of the corn 
acreage per section was: 


Source of variation 

Degrees of 

Mean 

freedom 

square 

Between townships 


6,511.9 

Within townships 


1,954.3 
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(а) Fill in the degrees of freedom, and determine the expectations of 

the mean squares. 

(б) Estimate the variance components. 

(c) What is the variance of a sample mean if / townships and q f sec¬ 
tions per township were sampled? 

(d) Determine 90 per cent confidence limits for the variance components. 

22 . 14 . Rigney and Reed 18 studied some of the factors affecting the 

variability of estimates of various soil properties. They took samples 
from 20 fields (R), 2 sections (B) from each field, 2 samples (C) consisting 
of a composite of 20 borings from each section, and then 2 subsamples ( D) 
from each sample. ( a = 20, 6 = 2, c = 2, d = 2.) The analysis of 
variance for several of the properties is presented below: 


Source of 
variation 

Degrees of 
freedom 

Mean squares 

Calcium 

Magnesium 

Potash Organic matter 

A 


19.19 

.1809 

.0405 

12.003 

B in A 


3.59 

.0545 

.0076 

7.674 

Cm B 


.30 

.0080 

.0011 

.275 

D in C 


.01 

.0005 

.0001 

.002 


(а) Fill in the degrees-of-freedom column. 

(б) Bet up the model for this sample and then the expectations of the 
mean squares in terms of the variance components. 

(c) Estimate the variance components for one of the properties and 
the variance of the general mean for this sample. What is the variance 
of a mean of a ' fields, b' sections per field, c ' samples per section, and d' 
subsamples per sample? 

22 . 15 . (a) In Example 22.2, suppose it cost $2.70 on the average to 
get to a given township, $2.10 extra to cover each area in the townships 
selected, and $3 extra to secure and analyze the data from each farm 
selected in the sample. Determine values of 6', c', and d' to minimize 
s 2 (T) if the total cost is fixed at $1,890. What is s 2 (F) for this sampling 
plan? 

(6) Suppose it was decided to set d = 2. Determine values of b' and 
c' to minimize s 2 (F) under this condition, and compute s 2 (F). 

22 . 16 . For a discussion of sampling from finite populations, see refer¬ 
ences 19 and 20. 

22 . 17 . (a) In Example 22.3, show how the expectations of the mean 
squares are computed from the expectations of the sums of squares. 

(6) Compute the estimates of the variance components in this example. 
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(c) In this example, the six districts constituted the population. 
Hence what would you say regarding cr 2 ? 

(d) What would be the expected value and the variance of the general 
mean from these data? Of the mean for district I? 

(e) The six bulked samples actually consisted of two sets of three sam¬ 
ples each. Show how the analysis would change if the computations were 
based on the means of the three bulked samples per set and a component 
for sets was put into the model. The actual sum of squares between sets 
in paths was 99.31. 

22 . 18 . In Exercise 18.4, suppose that the 13 markets were chosen at 
random from a large population of markets. 

(a) Set up the mathematical model for this sample and the expected 
values of the mean squares in the analysis of variance. Show that the 
expected value of the mean square for markets is erf + ka^, where erf, is 
the variance component for markets and 


1 


12 


69 - 


8 2 + 6 2 + • • • + 9 2 
69 


( b ) Estimate the variance components. 

(c) What would be the estimated variance of the mean of a sample 
from r' markets with q ' sellers at each market? 

(d) Suppose it costs $10 to visit a market and $1 to enumerate each 
seller, what is the optimum number of markets and sellers if it is desired 
to obtain 100 schedules? How much will it cost to obtain this many 
schedules ? 

22 . 19 . An experiment was to be designed to test various molds for their 
efficiency in the production of streptomycin. Before setting up an over¬ 
all experiment, a preliminary experiment was set up to examine the 
various sources of variability in the production and assay process. There 
are five stages in this process: the initial incubation stage in a test tube (or 
slant , as it is generally called), a primary inoculation period in a petri dish, 
a secondary inoculation period , a fermentation period in a bath, and the final 
assay of the amount of streptomycin produced. Several sampling plans 
could be devised to estimate the amount of variability contributed at each 
stage. Two of these plans are as follows: 

(i) Consider 5 test tubes, draw 2 samples from each test tube for the 
primary inoculation, 2 samples from each primary for the secondary 
inoculation, 2 samples from each secondary for the fermentation, and 
assay 2 samples from each fermentation bath. 

(ii) Use 2 test tubes as in (i); 2 test tubes the same, except only 1 assay 
per fermentation; 4 test tubes the same, except only one fermentation per 
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secondary and one assay per fermentation; 8 test tubes, with two pri¬ 
maries each and one of each stage henceforth. 

(a) Show that there are 80 assays for each plan. 

(5) Set up the analysis of variance for both plans, and determine the 
expected values of the mean squares, assuming all effects are random. 

( c ) Which of these two plans would you recommend if one did not 
know in advance the relative magnitude of the variance components? 
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CHAPTER 23 


ANALYSIS OF DATA WITH BOTH RANDOM AND FIXED EFFECTS 

(MIXED MODEL) 

23.1. Randomized Complete Blocks. In this section we shall consider 
the same model as in Sec. 22.2 except that the {r;j are assumed to be fixed; 
in other words, inferences are to be made regarding only these particular 
P treatments. We have made a basic change from the model of Sec. 18.3, 
in that the blocks are assumed to be representative of a larger universe of 
environmental conditions than those of the particular experiment being 
analyzed and we have added a (treatment X block) effect to the model. 

In Sec. 18.3, it was necessary to assume that there was no real TB inter¬ 
action if the blocks were assumed fixed and MS (TB) was to be used as 
the error mean square. If the true model for Sec. 18.3 was 

Yij = fj, + Ti + Pj + (rj3)y + 6ij, 

with fixed block and interaction effects, the expected value of the error 
mean square, s 2 , would have been crf + 0 i(t/ 3), where 

6i(t(3) = SSMV(r - l)(p - 1). 

Hence since E(MST) = a\ + 0(r), there would have been no real error 
term to test H 0 : \n = 0}, because under U 0 the expected value of the 
denominator of F would have been greater than the expected value of the 
numerator; in other words, F would have been too small, and too few 
significant results would have been posted. 

If the experimenter is willing to assume that the block effects are ran¬ 
dom, he finds that 

E(s*) = al + cr 2 ,, E( MST) = a* + ah + r*(r). 

Hence MST/s 2 is distributed as F under the null hypothesis, H 0 : { Ti = 0}, 
and it is not necessary to assume a nonexistent interaction. 

But if more than one sample is secured for each treatment in each block, 
that is, q samples, the experimenter can assume fixed block effects and 
still have a legitimate error term even if there is a fixed interaction. In 
this case, we have an estimate of the sampling error, o-|. This estimate is 
MSE, with rp(q - 1) degrees of freedom. Hence MST/MSE is dis¬ 
tributed as F under the null hypothesis of no treatment effects. 
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Let us summarize these statements with a table showing the expecta¬ 
tions of the treatment (T), treatment X block (TB), and sampling (E) 
mean squares for various experimental conditions and assumptions, f 



Assumptions 

Mean 

square 

's' 

II 

° 

(t/3) ^ 0 


Blocks fixed 

Blocks random 

One sample per treatment per block (q — 1) 

T 

TB 

A + r0(r) 

(A) 

cr\ + r$(r ) 

+ 01 (t/3) 

A + Ab + r 0( r ) 

(A + Ab) 

q samples per treatment per block 

T 

TB 

E 

A + rqd(r) 
It] (pool) 

A + rqe(r) 

A + ( /0i( r l3) 

(A) 

A + qAb + rc i e ( T ) 

(A + qAb) 

A 


The proper error term is indicated in parentheses. Note that for (r/3) = 0 
and q > 1, TB and E are both estimates of a\ and should be pooled to 
obtain the error mean square; however, if the experimenter is not sure 
that (t/3) = 0, he had better use TB as the error mean square. For 
(t/3) 0 and blocks fixed, there is no error term unless q > 1. 

As mentioned before, few experimenters wish to confine their ..con¬ 
clusions to the particular blocks used in the experiment. Hence they 
wish to assume random block effects. But if they assume random 
block effects and do not wish to assume nonexistent interactions, they 
can obtain an error term to test treatments even if only one sample is 
obtained for each treatment per block. And this error term is the same 
error term we have used all the time in Chaps. 17 to 21, the TB inter¬ 
action. But if the particular blocks used in the experiment cannot be 
considered to be representative of a wider area, the experimenter must 
have more than one sample per treatment per block in order to obtain an 
error term. 

Now let us proceed with an experiment using the model for this section: 

Yijk = M + Ti + (3j + (r(3)ij + eijky 

with random block, interaction, and sampling effects and fixed treat¬ 
ment effects (p treatments, r blocks, and q samples per treatment per 

fit does not make any difference whether blocks are fixed or not if (t/ 3) — 0. 
There is no sampling mean square if q = 1, 
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block). Such an experiment is designed to estimate treatment differences 
and confidence limits for the true differences and to make tests of these 
differences, as well as to assess several sources of variability which cause 
fluctuations in the yields of a given treatment. We are seldom interested 
in the general mean, contrary to the importance of this mean in Sec. 22.2. 
And we generally are interested in only the variance components which 
affect treatment comparisons, because we want to plan experiments to 
minimize the variance of treatment differences for fixed costs or amount of 
experimental material. 

We have presented the expectations of the T , TB, and E mean squares 
for our present model [blocks random and (t(3) ^ 0]. There is some 
controversy over the expectation of the blocks mean square, but we shall 
proceed as follows: 


^ ^ r * + pqfii + q ^ ajk — pqftj + 


We know that ^ n = 0 from the assumption of fixed treatment effects. 


It seems reasonable to also assume that 


y W)tf = 0, 


since all the (for a fixed j) in the population are in the sample. (See 
reference 1 for a similar approach.) Naturally the sum aver j for a fixed 
i is not zero, because the r blocks in this experiment are assumed to be a 
random sample from a larger population. Similarly 



Hence 


E'(SSB) = pq(r — l)of + (r — 1 )<rf, and E( MSB) = pqal + a 2 e . 

Because of the above restriction on the (t/3) effects, we shall use the 
same definition used for nested sampling from a finite population, for 
example, reference 2, to define <7% as follows: 

- J i.rfi)l/(V ~ I)- 

i 

Using this definition of crf b , it is easy to show that the expectations of 
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MB(TB) and MSE are the same as in Sec. 22.2, as stated earlier in this 
section. We see, for example, that 

E[S(Y - ix ~ u) 2 ] = rpqul + (p - 1 )rq<r% + rpqa 2 . 

ijk 

There are certain difficulties about the variance of Y when the treatments 
are finite; presumably the finite population correction should be used 
here, but we shall neglect it in the discussion which follows. Perhaps 
some of these theoretical difficulties would be resolved if we used the 
intraclass correlation concept, which is used by Hansen and Hurwitz 3 in 
considering finite populations for nested sampling. 

Since the {r,-} are assumed fixed, 

jS(SST) = rq ^ r| + (p ~ + <r?), 

E( MST) = rqe[r) + <r\, 


where 0(r) — rq^ t?/( p — 1) and a\ — q<r? b + <r®, as in See. 22.2, and 


<rj = MS (TB). 

Hence the analysis of variance for this mixed model is as follows: 


Source of variation 

Degrees of 
freedom 

Mean 

square 

E( MS) 

Blocks 

r - 1 

MSB = Vi 

<r* + pqc\ = e* 

Treatments 

v - i 

MST = 7 2 

+ Q°tb + rqO(r) 

T XB 

(r - l)(p - 1) 

MS (TB) = Vz 

A + q*i - 

Samples 

rp(q - 1) 

MSE = 

a) -aj 


Since the r’s are fixed, 


= *3 + ^ = 

v / r r pq 


o-f ~h cr| — cr| V\ + Vs — Vi 


rpq 


rpq 


The variance of a given treatment mean, f i} is 


2(f) * 4 4- £& 4- d = ~ ^4 ^ El + pVz - 7 4 

° v r " r r ^ rq rpq rpq 


But the principal comparison desired in this experiment is the difference, 
d t between two treatment means, 


e\d) 



2oJ ^ 27». 

rq rq 


The first two comparisons have the degree-of-freedom complication, men¬ 
tioned in Sec. 22.2, but not a 2 (d). The importance of the variance com- 
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pohents can be assessed as in Sec. 22.2 and confidence limits approximated 
by any of the four methods mentioned in that section. 

In the above analysis, V 2 is not distributed as x 2 v 2 /.f 2 , because 

E(Ti — rqn) = rqri ^ 0 . 

V 2 is distributed as a noncentral x 2 IE(V 2 ) = + constant] with a 

variance of 

kl + rqff(r)i - p- f 

h h 

It can be shown that an unbiased estimate of the variance of V 2 , which 
consists of both random and fixed components, is 

s 2 (7 2 ) = — V 2 V 3 - __ 

■ ; h Cf 3 + 2 )/ 2 

In general, if the random component consists of several sources of 
variation, which must be estimated from more than one mean square, we 
can write for the mean square (V) with both random and fixed components 

E(V) = v* + X0, 

where <rj is the random and 8 the fixed component. We can write 


cr? = 7 ajar*-, 


where aj — ± 1 and <rf = Vj. That is, can be thought of as the sum 
and difference of several independent random variances, each estimated 
by a single mean square in the analysis of variance. Let V = V r + V 8 , 
where <s\ == V r and X8 == V 8 . Since the F’s are all independent, 

V r = ZajVj ' and E(VV r ) = E(V r )E(V) = <r r 2 (<r r 2 + X0). 

Also, since E{Vf) = an d E(V,V k ) = a]4, 

J 3 

= 2 of + 2 ^2 WM = y Art V i + 2 22 

3 j<k j 3 j<k 

Collecting terms, we find that an unbiased estimate of the variance of a 
mean square V, consisting of both random and fixed components, is 

:'V ^ IT IT 2 rv fj ' rz9. i 


f vv r--f[%frF2 v ? + 2 22 a ^ v 

j 3 <& 

with V r = ^ cijVj and / degrees of freedom for V 


t See references 4 and 5. 
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Example 23.1. An experiment was run to test the effectiveness of 9 
different spray materials on cherry trees with 9 blocks (81 trees) and foui 
1-pound random samples of fruit from each of the 81 trees in the experi¬ 
ment. The variable studied was the number of fruit per 1-pound sample, 
and it is assumed that block and sample effects are random. The totals 
of four samples for each of the trees were as shown in Table 23.1: 


Table 23.1 






Treatments 













_ 


Total 

Blocks 












0 

1 

2 

3 

4 

5 

6 

7 

8 


1 

506 

471 

580 

438 

497 

514 

468 

455 

494 

4,423 

2 

444 

464 

718 

478 

483 

484 

515 

451 

507 

4,544 

3 

452 

417 

638 

485 

474 

526 

495 

445 

506 

4,438 

4 

453 

443 

503 

437 

500 

539 

476 

457 

469 

4,277 

5 

468 

459 

596 

417 

493 

516 

462 

436 

470 

4,517 

6 

427 

428 

559 

457 

531 

496 

442 

479 

430 

4,249 . 

7 

460 

468 

583 

482 

509 

427 

470 

468 

462 

4.329 

8 

395 

506 

571 

414 

457 

452 

475 

418 

486 

4,174 

9 

455 

454 

718 

429 

515 

511 

406 

425 

484 

4,397 

Total 

4,060 

4,110 

5,466 

4,037 

4,459 

4,465 

4,209 

4,034 4,308 

39,148 

Mean 

112.8 

114.2 

151.8 

112.1 

123.9 

124.0 

116.9 

112.1 

119.7 

120.8 


The total sum of squares between samples for the same tree was 4,610. 
The other computations are as follows: 


r _ (39,148) 2 
U “ 324 ; 

(4,060) 2 + (4,110) 2 + • • ‘ + (4,308) 2 _ ( , ^ 45 3 2 6 . 

bbi--- 30 "■ ' ' 

SSB = (4,423) 2 + ( 4,544) 2 +_ • • • + (4,397);* _ c = 2> 804, 

36 

SS(7 7 B) = ( 506 ) 2 + ( 471 ) 2 + ' - C - SST - SSB 

= 19,722. 


The analysis of variance is as follows: 


Source of variation 

Degrees of j 
freedom | 

Sum of 
squares 

Mean 

square 

E( MS) 

Blocks 

8 

2,804 

350.5 

+ 36a 2 

Treatments 

8 

45,326 

5,665.7 

<T 2 + + 360(t) 

T X B 

64 

19,722 

308.1 

<r 2 + 4af b 

Samples in trees 

243 

4,610 

18.97 

a 2 
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The error term to test for treatment differences is MS (TB) = 308.1 with 
64 degrees of freedom. Hence the F value to make this test is 

F _ 5,665.7 
308.1 

which is highly significant 0 i(8,64) 

The estimate of 4 is 


= 18.39, 
= 2.80]. 


... . SiLzJM . , 72 . 8 . 

The variance of s% is 

2 _ 

16 

Hence the 95 per cent normal two-tailed confidence limits for 4 are 

72.3 - 26.3 < <4 < 72.3 + 26.3 
or 

46.0 < 4 < 98.6. 

It is obvious that this is the important source of variation, the sample-to- 
sample fluctuations being quite small. 

The variance of the difference, d, between two treatment means is 

= 36 

and is estimated by 

s 2 (d) = 2MS(TT>)/36 = 17.1. 

Hence the 95 per cent confidence limits for the true difference, 8, is 

d - 8.3 < 8 < d + 8.3. 

The limits for T 4 — T 0 are 

2.8 < 8 < 19.4. 

Again the experimenter is cautioned against use of these confidence 
limits to test the greatest mean against the smallest, unless he has decided 
to compare these particular treatments (regardless of their ultimate 
standing) before running the experiment. If the results of the experiment 
are used to decide on the comparison to make, the probability levels are 
upset. The reader is again advised to read an article by Tukey 6 on this 
topic. 

23.2. Split-plot Designs. A special type of incomplete-blocks design 
is the split-plot design, in which one set of p treatments (T) is arranged in 




Fl i fA 
36 ^ 9 


Z?5 , ZL 

66 “ r 245 


= 180. 
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a randomized complete-blocks design with r blocks (B) and a second set 
of q treatments (A) is assigned at random to one of q subplots in each of 
the pr whole plots. The mathematical model for this experiment is 

Yijk = jjl + Ti + ft- + + oi k + (ja)ik + tijk, 

where a, r, and {rot) are fixed treatment effects, (r/S)# and u$k are random 
sources of error, and /3y is a random block effect, f The analysis of 
variance for this model is: 


Source of variation 

Degrees of 
freedom 

Sum of 
squares 

Mean 

square 

E{ MS)f 

Blocks ( B ) 

r — 1 

SSB 

v b 

oi + m«l 

T 

v - i 

SST 

v t 

+ Q^tb + rqdi{r) 

T XB 

(r - l)(p - 1) 

SS (TB) 

V ih 

A + q*l 

A 

q - 1 

SSA 

V a 

+ rpB%{a) 

T X A 

(p - 1 )(q - 1) 

$S(TA) 

Via 

+ r 0 3 (roc) 

Subplot error 

p(r - 1 ){q - 1) 

SSE 

V e 



t 0 i(t), (a), and 9z(t(x) are sums of squares of fixed effects divided by the degrees 

of freedom. 


The first five sums of squares are computed in the usual manner for 
main effects and interactions (see Chap. 20) and the subplot error (SSE) 
by subtraction. Note that we have omitted <r% from the expectation of 
the blocks mean square. 

In making tests of significance of the differences among fixed effects, 
SSE is the error sum of squares for A and TA and S$(TB) for T. The 
problem of determining confidence limits for differences between various 
treatment means is slightly complicated. We know that 

(TA)ik = rp + rn + ^ ft- + ^ (t/3)*,- + ra k + r{ra)ik + ^ 

3 3 3 


We have the following types of means to compare: 
(i) Two T treatments 


Ti 


Tv = (r; — t^) + ^ [(r@)ij — (rP)i'j]/r + “ 


E(d t ) = (r* — tA, 


3 k 

«\d t ) + 


em)M, 


t In this case we assume that there is no real (ot/3) or (rap) interaction but that the 
only deviation in the subplot is a single random error, e;,-*. 








(v) Two different A treatments and two different T treatments 

d ta — (TA)ik — (TA)?# — (r* — tv) + ^ [(rj8)ij — (r/3)*v]/r 

y 

+ (ak — oik') + [(roi)ih — (ra)i'k'] + ^ — €*'#') A, 

3 

E(dta) = (r< — TiO + (a* — ar*') + [(ror),* — (rar) *'&']> 

^ 2 W = 2(<4 + <rj)/r = 2[F*& + (q ~ l)V e }/rq. 


It should be mentioned that if the experimenter wants to use (iii) or (v) 
to test the difference between two treatment means, he does not know 
how many degrees of freedom to use, since two error terms are involved. 
The number of degrees of freedom will fall between (r — l)(p — 1) 
and p(r — 1 )(q — 1). Hence, if the difference is significant using 
(r — 1 )(p — 1) degrees of freedom, it will be significant for the correct 
degrees of freedom; and, conversely, if it is not significant using 
p(r — 1 )(q — 1) degrees of freedom, we can conclude it is not significant. 
However, if it is not significant for the smaller and is significant for the 
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larger number of degrees of freedom, the test is inconclusive. A rather 
good approximation to the correct number of degrees of freedom is 
furnished by Satterthwaite: 7 

,, [Vtb + (<7 ~ i)^! 2 __ 

r = —n. , !('; in-.p ; 

(r - 1 )(p - 1) p(r - 1)(? - 1) 

A more complete description of the split-plot design can be found in 
references 8 and 9. This design enables the experimenter to obtain quite 
accurate information on the A treatments and on the interaction between 
T and A at the expense of less accurate information on the 1 treatments 
as compared with a factorial design (Chap. 20), since the error mean 
souare for the T treatments has an added component oi vaiiation and is 
estimated with fewer degrees of freedom. Also, with this design, the 
experimenter can allocate the T treatments to rather large plots and 
reserve the small subplotsfor the A treatments. If one set of treatments is 
cultivation methods and a second set is fertilizer treatments, it would be 
advantageous to put the former on the larger whole plots because of the 
difficulty of handling machinery on small plots. 

Also this design is useful if the A treatments are successive years m a 
long-term experiment. Unfortunately the years cannot be randomized, 
but it is not the year itself, but the effect of the year which we want to 
randomize. And it seems reasonable to assume that nature does a fair y 
good job of randomizing the year-to-year effects. But m this case it 
seems more reasonable to regard the subplot treatments years) as ran¬ 
dom effects, and not fixed. And since they are random, the assumption 
that the (a/3) interaction is zero seems somewhat unrealistic. Hence toi 
this case we would advocate using the following model: 


Yijk = M 4- Ti 


+ |Qj + (r/3)»y + oik + ira)ik + 


with everything random except M and r». Of course, all interactions wi 
Ti are assumed random in one direction only. Also, we still assume (r«/3) 
is zero, an assumption which does not strike us as being too bad even 
when the a’s are random effects. 

Example 23.2. A corn-yield experiment was conducted to compare 
four methods of primary seedbed preparation (T) and four methods of 
planting (A), using four blocks (B). The seedbed preparations were used 
on the whole plots, and each whole plot was divided into four subplots for 
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the four planting methods. The corn yields in bushels per acre were as 
shown in Table 23.2. 


Table 23.2 


Replication 

Planting methods | 

ai 

CL 2 

a s 

«4 

lotal 



ti plowed 7" 


—< 

1 

81.8 

46.2 

78.6 

77.7 

284.3 

2 

72.2 

51.6 

70.9 

73.6 

268.3 

3 

72.9 

53.6 

69.8 

70.3 

266.6 

4 

74.6 

57.0 

69.6 

72.3 

273.5 

Total 

301.5 

208.4 

288.9 

293.9 

1,092.7 



tz plowed 4" 



1 

74.1 

49.1 

72.0 

66.1 

261.3 

2 

76.2 

53.8 

71.8 

65.5 

267.3 

3 

71.1 

43.7 

67.6 

66.2 

248.6 

4 

67.8 

58.8 

60.6 

60.6 

247.8 

Total 

289.2 

205.4 

272.0 

258.4 

1,025.0 



U blank basin listed 



1 

68.4 

54.5 

72.0 

70.6 

265.5 

2 

68.2 

47.6 

76.7 

75.4 

267.9 

3 

67.1 

46.4 

70.7 

66.2 

250.4 

4 

65.6 

53.3 

65.6 

69.2 

253.7 

Total 

269.3 

201.8 

285.0 

281.4 

1,037.5 



ti disk-harrowed 



1 

71.5 

50.9 

76.4 

75.1 

273.9 

2 

70.4 

65.0 

75.8 

75.8 

287.0 

3 

72.5 

54.9 

67.6 

75.2 

270.2 

4 

67.8 

50.2 

65.6 

63.3 

246.9 

Total 

282.2 

221.0 

285.4 

289.4 

1,078.0 4,233.2 


The four planting methods were surface drill ( ai ), list (a.), basin list 
(“*)> and hst bedding sweeps (o 4 ). The T and A totals are: 



Sum 

Mean 

T x 

1092.7 

68.29 

T 2 

1025.0 

64.06 

Ts 

1037.5 

64.84 

Ti 

1078.0 

67.38 



Sum 

Mean 

A\ 

1142.2 

71.39 

A 2 

836.6 

52.29 

As 

1131.3 

70.71 

A 4 

1123.1 

70.19 
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The analysis of variance is as follows: 


Source of variation 

Degrees of 
freedom 

Sum of 
squares 

Mean 

square 

E (MS) 

Blocks 

3 

223.81 

74.60 

<r 2 e 

+ 16 ^ 

Seedbed methods (T) 

3 

194.56 

64.85 


+ 4 <x] b + 16^1 (t) 

T X B 

9 

158.25 

17.58 

c\ 

+ 4*? b 

Planting methods (A) 

3 

4,107.39 

1,369.13 


+ 160 2 (a:) 

T X A 

9 

221.74 

24.64 

4 

+ 403 M 

Subplot error 

36 

608.47 

16.90 

2 

°"e 

ii 


There are highly significant differences between the planting methods 
(F = 1,369.13/16.90 with 3 and 36 degrees of freedom) but not between 
the seedbed methods (although the F = 64.85/17.58 = 3.69 is almost 
significant, since F. o& = 3.86 for 3 and 9 degrees of freedom). There is 
no evidence of any interaction between the two sets of treatments. An 
examination of the planting means shows that only a 2 is really different 
from the rest. In order to adjust for the fact that we have selected the 
smallest from a set of four means, which can be done in 4 ways, the 
significance probability assuming random choice might be multiplied by 
4 to obtain an estimate of the true significance probability. If we take 
the linear form l — A\ + Az + A± — 3A 2 — 886.8, we find that 
<r\l) = 192 <rf = 3,244.80. Hence 

t = 886.8/V 3,244.80 = 15.6, 

for which the estimated significance probability is much less than 
4(.0005) = .002. And we see that this 1 degree of freedom accounts for 
almost all of SSA. The sum of squares for l is (886.8) 2 /192 = 4,095.91, 
leaving only 11.48 for the other 2 degrees of freedom, a decidedly non¬ 
significant amount. 

The variance of the difference between two T means is 

2(17.58)/16 = 2.20. 

The variance of the mean difference between two T treatments with the 
same A treatment is 2[17.58 + 3(16.90)1/16 = 8.54. This would apply to 
testing t\ versus 1% when planting method cq was used, the mean difference 
being 3.075 in this case. And finally the variance to compare two A treat¬ 
ments with the same T treatment is 2(16.90)/4 = 8.45. This would apply 
to cq versus a 2 for ti, the mean difference being 23.275. 

Example 23.3. Wilm 11 presents some data on soil moisture deficits 
(in inches of water) as affected by timber cutting with 4 blocks, 5 T treat- 
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ments (amount of cutting), and 3 years as subplot treatments. The 
T treatments were in terms of volume of board feet of trees larger than 
9.6 inches in diameter which were left in the forest after treatment: 
uncut (11,900 feet); 6,000 feet; 4,000 feet; 2,000 feet; all cut. The years 
were 1941, 1942, and 1943. In this case it might be reasonable to regard 
the years as random and add the (year X block) random effect to the 
model. The analysis of variance for these data was: 


Source of variation 

Degrees of 
freedom 

Mean 

square 

tf(MS) 

Blocks ( B ) 

3 

1.4832 

a e + 5*1 + 15 *1 

Treatments ( T ) 

4 

1.3333 

+ 3 a\ h -f 4of 0 + 120(r) 

T X B 

12 

.3909 

+ 3<rJ 6 

Years (4) 

2 

6.5418 

°~ 2 e + 5crf a -f 20c 2 

T X A 

8 

.2554 

+ 4 cl 

B XA 

6 

.1294 

+ 5c 2 a 

Remainder (E) 

24 

.1053 

c 2 


Using F — .1294/. 1053, we see that a$ a does not test significantly 
greater than zero; and using F = .2554/. 1053 = 2.43, we see that <rf a 
tests significant at the a = .05 probability level. The test of o- 2 is 
not too good, since there are only 6 degrees of freedom for error, but 
F = 6.5418/. 1294 is significant at the a = .001 level. The difficult test 
to make involves 0(r). There is no single error term to make this test. 
Two different tests have been proposed. Both tests are based on 
the Satterthwaite approximation. 4 - 7 In the first case, we use an estimate 
of (<r 2 + 3 (jj h -f 4 (T? a ) as the error term for MST. This estimate is 
Vtb+Vta— Ve=V r with /' degrees of freedom, where 

f __ F r 2 _ 

Jr (Vh/U) + (VL/fta) + (Ff// e ) 

and the f s are the corresponding degrees of freedom. Then we assume 
that F' = Vt/Vr is approximately distributed as F with (f t and /') degrees 
of freedom. Cochran 12 suggests that we add V e to V t and use 

F „ = Vt+V. 

y tb + Vta 

with// and// degrees of freedom. // and// would have to be computed 
as /'. There is some indication that F" is more powerful than F' in detect¬ 
ing real treatment effects, but there is some doubt that it is enough better 
to afford to compute 2 estimated degrees of freedom instead of 1. 
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In the Wilm problem, V r = .5410, and F' = 1.3333/.5410 = 2.46 with 
degrees of freedom 4 and 

,, _ (-5410?_ « 14 

C3909) 2 TT2 554 ) 2 i (- 1053 ) 2 

— JT~ + -8 + ' 24 

F' is not significant at the 5 per cent level; actually a is about .10 m 
this case. Similarly 

F" = 1.4386/.6463 = 2.23 

with (5,20) degrees of freedom. Again a is about .10. 

The variance of the difference between any two T treatment means is 

»■« - 2 + °f + ;*] * 

- ■*. 

Hence the 95 per cent confidence limits for 5 will be 

d - (2.15) (.30) < 6 < d + (2.15)(.30), 
where we use a t value with f r — 14 degrees of freedom. 

EXERCISES 

23.1. Prove that the expected value of s 2 in Sec. 18.3 is actually 
o-| + <j% if there is an interaction. 

' 23.2. Reexamine the examples and exercises in Sec. 18.3 and Chap. 20 
to see in which experiments it is logical to assume the blocks are repre¬ 
sentative of a wider set of experimental conditions than in the given 

experiment. . . 

(a) How would the cities in Exercise 18.9 and the plants m Exercise 

18.10 have to be selected to make this condition true? 

(b) How can you rationalize that consecutive days or consecutive 
years might have random effects on crop yields or other experimental 
results, such as in Example 18.3? 

(c) Where does this rationale fall down in the analysis of economic 
data, such as market prices? 

23.3. What assumptions must be made concerning the effects in a 
Latin-square design in order to extend the results to a wider horizon than 
the given experiment? 

23.4. Determine the four approximate 90 per cent confidence . inter¬ 
vals for <rf b in Example 23.1. 
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23.5. (a) In Exercise 20.3, what changes should be made in the analysis 
if the four blocks and five varieties constituted representative samples 
from large universes of blocks and varieties? 

(b) What are the expectations of the mean squares under this assump¬ 
tion? 

(c) Determine estimates of the random variance components, and 
make the necessary tests of significance. 

(d) Make a test of spacing differences, and determine the standard 
error of the difference between two spacing means if p' varieties and r’ 
blocks were used. 

23.6. A field experiment was conducted to determine an acceptable 
lower limit on the size of similar future experiments. The analysis of 
variance was: 


Source of variation 

Degrees of 
freedom 

Mean 

square 

Blocks 

2 

4,399 

Treatments 

4 

4,429 

T X B 

8 

866 

Samples in plots 

15 

239 

Determinations in samples 

30 

7 


(a) Assuming everything random, except treatments, set up the expec¬ 
tations of the mean squares, and determine estimates of the random 
components. 

(b) What is the standard error of the difference between two treatment 
means with b blocks, $ samples per plot, and d determinations per sample? 
Which of these three sampling plans (which cost the same) would be 
favored: b = 6 , s = 2 , d = 1 ; b = 3 , s = 5 , d = 1 ; b = 4, s = 2, d = 2? 

(c) Determine the 90 per cent confidence limits for a%. 

23.7. Show how the variance components in Example 23.3 were 
obtained, and also a 2 (d). 

23.8. Derive an approximate test for Example 23.2 to test whether the 
average yield for fa and fa was significantly greater than the average of fa 
and fa. 

23 . 9 . Johnson and Tsao 13 conducted a psychological experiment to 
determine the difference limen (DL) of subjects for weights increasing at 
constant rates. Two classes of people were chosen as to sight—normal 
(A) and congenitally blind (. B ). Two males and two females were 
selected to represent each class, giving a total of 8 people. Then the 
average of five DL values was determined for each subject for each of 28 
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rate-weight combinations, 7 weights and 4 rates. The entire experiment 
was repeated at a later date with different people. Hence there was a 
total of 28 X 8 X 2 = 448 average DL’s. The order of presentation of 
the 28 combinations was randomized for each subject at each date. 

(a) Set up the analysis of variance as to sources of variation, degrees of 
freedom, and expectations of the mean squares. 

(b) How would the analysis be changed if each of the five DL values 
was recorded instead of a single average being reported? 

(c) A portion of the data (for 3 weights and 2 rates) is reproduced in 
the accompanying table for computational purposes. 


Data of Exercise 23.9 


Weight 

Rate 

1 

2 

3 

a b 

a b 

a b 

Sex Sight Date Individual 

1 1 

M A 2 

2 1 

2 

4.5 10.9 

14.0 30.5 
3.1 6.2 
13.4 26.2 

4.5 10.1 
13.9 25.5 
3.3 6.8 

12.8 23.5 

4.5 8.6 
12.2 29.3 

3.5 7.0 
12.8 27.8 

1 1 

MB 2 

2 1 

2 

24.2 48.1 

19.3 41.0 
41.2 59.1 
19.9 44.1 

25.3 41.2 
21.1 30.1 
29.8 59.7 

18.3 28.5 

25.1 31.4 
19.6 35.0 
28.5 48.7 
15.9 31.8 

1 1 

2 

F A 2 1 

2 

18.5 22.3 
3.1 7.0 

11.2 21.8 
3.9 11.4 

10.2 25.2 

3.9 5.4 

8.8 14.0 

5.1 10.2 

11.4 19.7 

3.6 7.3 
8.1 21.8 

3.7 8.7 

1 1 

2 

F B 2 1 

2 

9.6 17.9 

9.0 16.1 

6.1 13.9 

8.6 14.5 

7.3 18.2 

6.4 15.9 

6.0 12.5 

6.9 14.2 

6.5 15.8 
6.9 12.9 

6.4 12.1 

5.3 12.0 

R ’ ' 


23 . 10 . Professor J. Wishart 10 furnishes us this example. “The seed 
mixtures denoted by A, B, C, D below were sown under wheat in 1938, the 
treatments being randomized over the plots of each block. In the sum¬ 
mer of 1939, Blocks I, IY, VI and VII were grazed and Blocks II, III, V 
and VIII were cut for hay (each successive pair of blocks was taken as a 
unit and a random choice made as to which block should be grazed and 
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which cut for hay). The table below gives the estimated weights, in 
pounds, of clover (green) in the May 1940 cut. Analyze these weights 
fully and draw attention to the significant results. The sum of the 32 
weights is 527.4 and the sum of their squares is 9,410.18.” 


I 

II 

III 

IV 

C B 

8.7 14.1 

B X 

15.2 14.8 

C D 

16.3 22.4 

D B 

25.1 25.3 

A D 

8.8 13.2 

C D 

8.7 9.4 

A B 

9.4 13.9 

C A 

19.6 18.1 

Y 

VI 

.. , 

VII 

VIII 

C A 

22.2 21.9 

C B 

22.4 16.6 

C B 

19.6 18.6 

D A 

19.9 17.4 

B D 

21.5 20.6 

A D 

16.0 13.6 

D A 

13.3 13.3 

B C 

14.4 13.1 


23.11. Suppose in Example 20.2 that the plot yields were available for 
each of the 12 years with the following sums of squares for years and inter¬ 
actions with years: 


Years (Y) 165,808 

Years X treatments ( YT) 6,185 

Years X replications (YR) 19,554 

Remainder (E) 44 490 


(&) Set up the complete analysis of variance for this experiment. 
Remember that the analysis in Example 20.2 was based on the total 
yields for 12 years, while the above sums of squares are based on annual 
yields. 

(b) What conclusions would you draw from this analysis? 

(c) What aspects of these data might violate the assumptions behind 
the use of the analysis of variance in making tests of significance? Do 
you have any suggestions for improving the analysis? 

23.12. In Exercise 22.4, assume that the treatment effects are fixed., 

(a) Derive the expectation of the treatment mean square. 

(b) How would you test for treatment differences? 

23.13. In a bee experiment conducted by E. Oertel in Louisiana, the 
honey was collected from 32 colonies randomly assigned to 4 rows of 8 
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colonies each for each of 5 years. It seems reasonable to assume that 
colony and year effects are random and row effects are fixed. We note 
that this is a completely randomized design for the whole plots (rows are 
T treatments). The analysis of variance for the yields in pounds of 
honey per year was: 



(a) Set up a mathematical model for this experiment. 

(b) Fill in the degrees-of-freedom column, and show that the E[ MS) 

column is correct. ... 

(c) What do you conclude regarding the colony-to-colony variation m 

the same row? t . 

(d) Is there much evidence of a real row-by-year interaction in this 

problem? 

(e) What is the most important source of variation? What are 
approximate 90 per cent confidence limits for this component? 

(/) How would you test for row differences? 

( g ) The mean productions per colony per year for each row were: 
182.5, 149.2, 158.9, and 135,6. Show that the standard error for the 
difference between any two row means is 12.6. 

23.14. Suppose data were available on the average sales price per pound 
of tobacco for a period of y years at each of m markets, these markets 
are divided into 4 geographical areas with mi, m 2 , m 3 , and m 4 markets per 



(a) Set up an analysis of variance to reflect the sources of variation in 
the marketing picture, assuming that areas are fixed and the remaining 
components are random. 

(5) Determine the expectations of the mean squares. 

(c) What is the variance of a mean over all 4 areas with m' markets per 
area and y' years? 

23.15. A randomized complete-blocks experiment was conducted on 
the percentage of acid in orange juice with 4 blocks and 5 fertilizers, each 
plot having 3 trees. Two samples were taken from each tree, one sample 
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being picked by one man and the second sample by another man. The 
basic analysis of variance was as follows: 


Source of variation 

Degrees of 
freedom 

Mean 

square 

F'(MS) 

Blocks ( B ) 

3 

.002315 


Fertilizers ( T ) 

4 

.042064 


T X B 

12 

.009133 


Trees in plots ( E ) 

40 

.004735 


Men (A) 

1 

.012813 


T X A 

4 

.003924 


A XB 

3 

.003449 


Remainder 

52 

.002749 



(a) Set up a model for this experiment, assuming we want to make 
inferences about only these fertilizers, but that all other effects are repre¬ 
sentative of large populations. 

(b) Compute the expectations of the mean squares for your model. 

(c) Compute estimates of the variance components. 

(d) How would you test for fertilizer differences? 

(e) What is the variance of the mean amount of acid for any one 
fertilizer if r' blocks, q' trees per plot, and m' samples are used? Compute 
this variance for r' = 4, q' = 3, m! = 2; r' = 6, q' = 1, m! = 4. 

23 . 16 . (a) In Sec. 20.6, show that if the replication effects are assumed 
to be random, we can obtain the following analysis for the (2r — 1) 
degrees of freedom between blocks: 


Source of variation 

Degrees of 
freedom 

Mean 

square 

F(MS) 

Replications ( R ) 

r — 1 

MSR 


ACD 

1 

MS(ACD) 

<t\ + 4cr^ + 4r0(r) 

(ACD) X R 

r - 1 

MSE'f 

A + 4*-2, 


t MSE' is the whole-plot error mean square, and <r\, is the variance for the (replica¬ 
tion) X (ACD) error. 

(6) Apply this result to test the NPK interaction in Exercise 20.15. 
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CHAPTER 24 


THE RECOVERY OF INTERBLOCK INFORMATION 
IN INCOMPLETE-BLOCKS DESIGNS 

24.1. Balanced Incomplete-blocks Designs. It was mentioned in 
Sec. 19.3 that Fates 1 introduced a new computing technique for balanced 
incomplete-blocks designs which utilized the so-called interblock informa¬ 
tion on treatment effects. The basic change in the model is that the /Fs 
are assumed to be random effects with the same variance, the plot-to- 
plot (intrablock) error being designated by From Sec. 19.3 we know 
that the intrablock estimate, U, of n had a variance 


l (ti) = 


k — 1 1 p — 1 1 


kE f rEfWi p rE f Wi 


where Wi — 1 /<j\. We can obtain a second estimate of r t - (the interblock 
estimate) from ^ n^Bj = Tu (see Sec. 19.3). 


k 


Tbi ~~ p G = ^ ~ ^ Ti + k X n ' ij ^ j ~ p z ^ 

3 3 


+ Yl "n'Ujeej 


V ^ 

l 3 


ni'j€i'j. 


T b{ ~-G 
V 

r — X 


— tbi- 


\ _ rk (P ~ k ) ( k *t + <rj) __ P ~ 1 k 
{ bi) V (r - X) 2 “ p (r - X)W 2 J 


where W 2 = 1 /{ka\ + af). 

If we have two independent estimates (ti and t 2 ) of the same quantity, 
r, the combined estimate with minimum variance is given by 


W[t i + W% 
W[ + W' 2 ' 
358 
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where IT' = 1 /(r 2 (U). Hence the combined estimate of r'(= r% + jj) is 


fie " Tc + wi 


rE/ITT; + 
rE/ITi + 


(r - X)1T 2 


A; 


tbi 


(r — X) IT 2 


(kTi - T u )W l , TbiTFa 
Tb ^ k 


k 

kGW 2 

pk 


rEfWi + 


(r ~ X)IT 2 


+ - 

rp 


G 

rp 


6Di = 


Ti + rdDi 


where 6 = (W 1 - TT 2 )/r[p(fc - l)Wi + (p - fc)TH 2 ] 
and Di = [(p - Jfc)r f - - (p - i)r M + (* ~ 1 )^ 1 - 

Certain identities are useful in making the above simplification: 

(r — X)(p — 1) — r(p ~ k) and kE/(p — 1) = p(k ~ 1). 

The variance of the difference between two adjusted means, using 

interblock information, is 

_ 2 k(p - 1) 

r[p(k — l)ITi + (p — k)W 2 ] 

The computations needed to estimate TIT and 1T 2 are the following: 

(i) BBB(unadj) in the usual manner. 

I n (l n,y 


(ii) 


(iii) 


k(r — X) "pk(r — X) 

l D < 


SST&. 


SSD. 


N(p - k)(k - 1) 

(iv) SSB(adj) = SSD + SSB(unadj) - SST 6 , with (q - 1) degrees of 
freedom. IT = MSB(adj). 

(v) SSE = Sy 2 - SSB(adj) - SST(unadj), with N - p - q + l 
degrees of freedom. V e = MSE. 


(vi) It can be shown that E(V b ) = v\ + 'N~ZT\ 


Hence 


w N ~P 

W2 ~ k(q - l)V b — (j> ~ k)Ve 

(vii) Compute rd and then t' ic = (Ti + rdDi)/r. 

In general it is also advisable to compute SST(adj) as a check on the 
computation and to make a test of the differences among the treatment 
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means. If the experiment is large enough so that we can replace W x and 
W 2 by their estimates in the formula for t' in the problem is solved. If 
there are fewer than 15 degrees of freedom in V b , the authors do not 
recommend using this weighted analysis; instead, they advise using the 
unweighted analysis of Bee. 19.3 (using no interblock information). And 
with 15 to 25 degrees of freedom, the weights are not too stable. 

If groups of blocks form c complete replications, the sum of squares for 
replications with (c - 1) degrees of freedom is removed from SSB(adj) 
and the expected value for V b [with (q — c ) degrees of freedom] is 


E(V b ) 


k(c - 1) 9 
~c ^ 


Hence 


W 9 = ^ V — k(c ~ 1) 

Hq - c)v b - (P - k)v e 


E(V b ) = o 2 + ^-^ 


rV h - V. 


This is the result for balanced lattices with d = 1. 

Example 24.1. Suppose we estimate of from the balanced-lattice 
experiment of Example 19.1. The values of 7) ; = 67^ - 8 T bi + 2 G are 
242, -6, -276, 118, 74, -28, -80, -76, and 32. 


SSD = 


(152) 2 + • • - + (1 88) 2 _ (1,512) 2 

- ______ 

(242)2 + • • • -f (32)2 


SSB(adj) = 389, 

since SST 6 = SSB(unadj) — SS(replications). 


TV 2 = 


■H 5 - = 48.63, V e = 28.57. 

_3 3 

4F & - V e 4(48.63) - 28.57 ” ' 01 
TVi - W 2 ^ V b - V e 20.06 

4(18BT + 6TF 2 ) 72V b 3,501 


01808, 


= .03500. 


.00573. 


The adjusted treatment means, t' c = (Ti/r) + QD h are 20.13, 11.72, 2.92, 
22.93, 14.17, 11.34, 13.54, 7.06, 22.18. The variance and standard error 
of the difference between two adjusted means are 


4(18IFi + 6TF 2 ) 3 Wi + W 2 


- 16.25. 
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Using recovery of interblock information, the relative efficiency of this 
design compared with randomized complete blocks is 

I = 17.64/16.25 = 1.09. 

Hence the recovery resulted in a gain of 16 per cent over no recovery. 
However, with only 8 degrees of freedom to estimate o-f, we believe that 
this is a rather fictitious gain since the estimates of the weights (W i and 
W%) have considerable error. 

24.2. Simple Lattice Designs. In Sec. 19.4, we indicated that most 
analyses of lattice designs also make use of interblock information to 
increase the efficiency of the estimates of treatment effects. Since the 
blocks are arranged in complete replications, a constant y is introduced 
into the model for fixed replication effects, with random p effects within 
replications. As before, we assume that we have p — k 2 treatments in k 
blocks of k treatments per block. Since the simple lattice has 2 replica¬ 
tions, we have 2k blocks and 2 k 2 plots. 

The first replication is designated as I and the second as Y with 
individual yields X^ and Y ijf where (i,j) = 1 , 2, . . . , k. The mathe¬ 
matical model for and F# is 

Xij — M + lx + Pxi + Tij + €ij, 

Yij — M + 7y + Pyj "T Tij + 

where P, e, and e' are independent random effects and the others are fixed 
{P\ are NID(0,cr|) and {€,e 7 } are NID(0,<rJ). Hence 

G = 2/cV + k (l fixi + 2 fij) + 22 

i 3 i 3 


X.. 

= k 2 (n + y x ) + k 

S Pxi + 

22 t<h 

r.. 

= k 2 {ii + y v ) + k 

i 

y Pyj + 

w 

% J 

ii <» 

Xi. 

= k(t u + y x ) + kpxi + ^ r 

ij T" ^ 

Yi. 

= k(n + 7y) + ^ P 

J 

yj + \ ' 

J 

Tji + ^ 4/> 

Di. 

II 

i 

in 

J J 

~ 7 y) + kp xi — ^ P vj + ^ (< 

D.. 

= k 2 (y x - y v ) + k 

G> 

- 1 (>«) + 22 ^ 


3 i j 


where is a linear function of the block effects adjusted for treatment 
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effects. Similarly we can compute D. h using X.j and Y.j, These (2k) 
D’s could be used to solve for the /Fs, adjusted for the Fs. 

We can use G, X. ., and F.. to estimate /*, y x , and y y . The replication 
sum of squares in the analysis of variance (with 1 degree of freedom) is 

X*. + F. 2 . _ G 2 _ (X.. - F..) 2 
k 2 2k 2 2 k 2 


with an expected value of 


+ kog + <T 2 . 


A block sum of squares, adjusted for treatments, with 2 (k — 1) degrees 
of freedom, can be computed from 


- 2DI + SD 2 - _ 2 
SSB(adj) 2 fc 2 k 2 

E[SSB(adj)] = 2(k - 1) (jcrf + <r*)- 


Hence if we designate V b = SSB(adj)/2(& — 1), 


It can be shown that 



SSE = &r 2 + Sy 2 - SST(unadj) - SSB(adj), 

with 2(k 2 — 1) — (k 2 — 1) - 2(k - 1) = (k - l) 2 degrees of freedom. 
Also, 

E(V e ) = E( MSE) = erf, 

so that 

o\ = F e and og = 2(F 6 — FJ/Tb. 

In order to estimate a treatment effect, Lj, we shall write f 

“f" (tij t%. t.j ) 

= (?. + ?,■ - 2m) + (4 - + m) = « + v. 

In Sec. 19.4, we used F*. as an estimate of £[. and X.y as an estimate of 
t'. } ; since these two estimates were free of block effects. However, now 
that we have a means of handling the block effects, it seems reasonable to 
utilize this interblock information. 

It turns out that the part in the second parentheses ( v ) is independent 
of block effects, even when we use all of the data to estimate and t'.j 

f This method is suggested by the similarity between this design and a factorial 
with k levels of two treatments. The effect of the ith level of one and jth level of a 
second is generally written in this form. Let t stand for the main effects (first paren¬ 
theses) and v for the interaction (second parentheses). 
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That is, if we use all of the data, 



ti 

= It. 

+ 

2k 

Xi. 

= M + ?i- 4" 

fixi 4" By 

2 

+ 

4* ej. 
2 


V.J 

= III 

+ 

2k 

In 

= fx + f -j + 

Bx- + fiyj 
2 


e.i + ^ 
2 


hi 

= 

+ 

2 

it ; 

= M + Tij + 

fixi 4~ ftyj 

2 

+ 

tij 4“ 4j 

2 } 



G 


IX + 

Bx. + By. | 

6. . + eA 




m 

2k 2 


2 ' 

2 

> 


where e<. 

Ki¬ 

ll 

tij/k, 

e.. 

= ZXeij/k 2 , etc. 

Hence 




j 

V 


. — 

tl ~ 

i 

.f+-i 

+ 

II 

T ij T%• 

— 

T.j “i - 

where e v 

— 

ij + 4 

3 

■ hi. 

+ 4- + e.j + 4j) + ( 

e.. 

+ *.)]. 


However, if we use all of the data to estimate t « (J[ m + t[j — 2m), 
block effects will be involved. Hence it is proposed that we obtain two 
different estimates of t, one free of block effects (£ 1 ) and one involving 
block effects (£ 2 ), and form a weighted average of the two. 

W[ti + 

W[+W' 2 "’ 

where TFJ = 1/o-f. Then we shall have 



Yi. 4- X.i 

G 

ti = 

k 

k 2 

u = 

Xi . 4- F.i 

G 

k 

k 2 


= f.j + U. + €l, 

= f.j 4“ ?%. + ^2 4“ «2y 


where ei = (i.j + — e.. — eh), e 2 = (ii. 4“ — e., — and 0 2 — 

P* + Pvi - Bx. - By. . Hence 


and 


0*1 


2 (k - 1) 
k 2 


a 


2 

e> 


*2 = 


2(fc 


k 2 


- rf + M + Ml- 


A : 2 


Therefore £ = [Wrfi + WM/{W\ + W 2 ), where Wi = 1/trf, and TF 2 = 

l/(^i+ 0- 

Again we caution against the use of a weighted analysis if there are less 
than 15 degrees of freedom for estimating o-g (and we have grave doubts 
when there are less than 25 degrees of freedom). It is better to duplicate 
the experiment to obtain more information on u\ if weighting is to be used. 
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In computing adjusted treatment means, some simplification can be 
made. We note that 



Note that (/),:. — D.,) is the same as given in Sec. 19.4. The only 
difference in t'a is that o,, is multiplied by the weighting factor 

„ = (Wi - W,) 

2k(Wi + W%) 

instead of 1/2/c. If = 0, so that IT, = Ws, no adjustment at all is made 
for blocks: ty = T^/ 2. If of is small compared with of, $ approaches 
1/2 k, the value used without recovery of interblock information. 

It can be shown that the variance of the difference between two 
adjusted treatment means, which appear in the same block, is 


1 

kWi 


(*-!) + 


2Wi 

Wi + W 2 _' 


And for varieties not in the same block, this variance is 


1 \(k-2) 1 4TFl 1 

kWi z + WTTWiJ 


The average variance is 


w/ [ <& 


4TTi 

'i + W' 2 _ 


1 4kd 

wA ^ k + : 


Also, the efficiency of the simple lattice, relative to randomized complete 
blocks, is 


I = 


k + Wi/W 2 

(k - 1) + 4TTi/ (W 1 + Wz) 


The reader is advised to read references 2 and 3 for more details on lattice 
designs. 

If the simple lattice is duplicated several times (Sec. 19.4) so that 2d 
replications (2 kd blocks) are used, it can be shown that the expected 
value of the “block-effect" mean square with 2 (d — 1 )(k — 1) degrees 
of freedom is (of + kal). The expected value of the other mean square 
with 2 (Jc — 1) degrees of freedom is, as above, (of + i&of). Hence, if 







INTERBLOCK INFORMATION , 365 


the two mean squares are pooled to obtain the adjusted-blocks mean 
square with 2d(k — 1) degrees of freedom, the expected value of this 
2d - 1 


mean square 


is (of 


+ 


2d 


W 2 = 


2d - 1 
2 dV b - V e 


**?). 


and 


In this latter case, 


e = 


d(V h - v 9 ) 


2k[dV b + (d - 1)7J 


The variances of adjusted mean differences will be divided by d. 

Example 24.2. In order to illustrate only the computing procedure 
for the recovery of interblock information, we shall use Example 19.2. 
Actually one should not contemplate any recovery with so few degrees 
of freedom for blocks. 


Di. D.j 
-22 2 

-21 -16 

- 3 -32 

"—46 --46 


SSB(adj) - 


S(A-. + 2?fy) 

6 


( — 46) 2 
9 


134. 


As a check, BSB(unadj) + BST(adj) should be equal to SSB(adj) + 
SST (unadj). We note that 

68 + 593 = 134 + 527 = 661. 

It is advisable to make this check on the computing. The analysis of 
variance is: 


Source of variation 

Degrees of 
freedom 

Mean 

square 

E{ MS) 

Replications 

1 

118 


Treatments (ad j) 

8 

74 


Blocks (adj) 

4 

34 = V b 

+ 1.5 al 

Error 

4 

47 = Ve 

A 


The estimate of of would be negative, and so we assume of — 0. 
Hence, no adjustments would be made even if there were sufficient 
degrees of freedom to estimate the weights. This raises a rather critical 
issue regarding incomplete-blocks designs, namely, that we do not take 
the loss in efficiency which we really suffered by using such a design with 
a negative estimate of of. Instead we use — Ty/ 2, with the variance 
of a mean equal to = 24. 

If Vb > V e so we could obtain a positive estimate of and if there were 
sufficient degrees of freedom to warrant weighting, we would compute 
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$ij — D { . — D.j . Then 6 = (W i — W 2 )/(y(Wi + W 2 ), where W\ = l/V e 
and W 2 = l/(2V b — V e ). (Here we assume that 6 = 0.) An adjusted 
treatment mean is 



The estimated average variance of the difference between two adjusted 
means is 

7e[\ , 2V b - F/l 

2 L V b 

24.3. Other Lattice Designs. Reference 3 contains the theory and 
computing details for a triple lattice and reference 2 the computing pro¬ 
cedures for rectangular lattices. One of the present authors presented 
the method of analysis for d duplications of any orthogonal square lattice 
design, f If there are r replications in the basic design (rd total replica¬ 
tions) and if the two parts of the adjusted-blocks sum of squares are 
pooled with rd(k — 1) degrees of freedom, 

Wl rdVb - V e and 9 ~ rfc[(r - 1)W 1 + W?]' 

The average variance of the difference between two adjusted means is 

2 L , r’fe ' 
rdWx L h + 1 

EXERCISES 

24.1. (a) Prove that, if we have two unbiased independent estimates, 
ti and t 2 , of a parameter r with respective variances 1/W[ and l/W r 2 , the 
combined linear unbiased estimate with lowest variance will be 

_ W\ti + W% 

W{ + W' 2 

Hint: remember that E(t) — r. 

(b) Show that ti and t 2 are independent estimates of r in Sec. 24.1. 

24.2. (a) Derive the formula for a 2 (t b i), and show that t' ie = (Ti/r) + 6Di. 

(b) Also derive the variance of the difference between two adjusted 
means. 

(c) Show that SSB(adj) is independent of treatment effects. 

(d) Show that E(V b ) and the estimate of W 2 are correct in (vi) above. 

f Presented by R. L. Anderson at a meeting of the Biometrics Section of the Ameri¬ 
can Statistical Association, March, 1946. An abstract is presented in the June, 
1946, issue of the Biometrics Bulletin (p. 58). 
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24.3* As a computing exercise only, analyze the data in Exercise 19.1, 
using recovery of interblock information. Why is it dangerous in prac¬ 
tice to use recovery for this example? 

24.4. Analyze the data in Exercises 19.2 and 19.3, using recovery of 

interblock information. 

24.5. Prove that V e is an unbiased estimate of <j\. In other words, 
show that SSE = Sx 2 + Sy 2 - SST(unadj) - SSB(adj). 

24.6. On page 363, show that the results for cr\ and a\ are correct. 
Why is (2V b — V.) an unbiased estimate of (<r| + fcof)? 

24.7. Derive the formulas for the variances of the differences between 
two adjusted means, using a simple lattice design. Also, derive the 
formula for relative efficiency, I. 

24.8. (a) Analyze the data in Exercise 19.4, using recovery of inter¬ 
block information. . ' 

(b) What is the relative efficiency of this design compared with a 

randomized complete-blocks design? 
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CHAPTER 25 


OTHER TOPICS CONCERNING COMPONENTS OF VARIANCE: 
SUMMARY OF NEEDED RESEARCH 

25.1. Covariance. Cochran 1 considered the following covariance 
model for a sampling design with p plots and r samples per plot, 

Yij = n + rf + fan + tij, 

where the r’s and e’s are assumed NID with zero means and respective 
variances of and of, and n, j3, and the z’s are assumed fixed. The maxi¬ 
mum-likelihood estimate of (3 is a weighted mean of the b 1 s derived from 
the plots ( b t ) and error ( b e ) lines of the analysis of covariance [see Sec. 
21.2 (v) with treatments corresponding to plots]. 

If information about j8 in the plots line is disregarded, the procedure 
is as follows: 

(i) s* 2 is an unbiased estimate of of and SSE* is distributed as 

<r 2 e Xl(r- 1 )— 1 - 

(ii) SST* = r V (n - f - bA.) 2 + (b t - 6,) 2 where " 

i 

Yi. = M + rf + PU + €i.. 


The first term is distributed as (erf + rof)x? P _ 2 )- Since 


b t = £ + 


rS(r* + ei.)2i 


be = P + 


EZr ’ 


the second term is distributed as (erf + Xrof)x?, where X = -S'**/(T** + $**)■ 
(iii) All three chi-squares are independent. 


(iv) 


(MST* - s* 2 ) 


1 


r (p — 2 + X) 

The estimate of erf in (iv) involves some loss of information because plot 
information about P is ignored, and the single degree of freedom in SST* 
is presumably given too much weight. 

If the estimate of p from the error line is used, parallel results can be 
obtained for a two-way classification, for example, randomized blocks. 

25.2. The Use of Components of Variance in Regression Problems. 
Variance components are also used to evaluate methods of estimating 
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regression coefficients. The experimenter often obtains several values 
of Y for each value of X in order to determine the sampling or observa¬ 
tional error (see Example 13.1 and Fig. 13.3). The model for r inde¬ 
pendent variates with p samples of Y for each set of {X*} values is 

r 

Yjk = m + ^ toy + €y + 8 jk , j = 1, 2, . . . , n; k = 1, 2, . . . , p, 

i— 1 

where e and 8 are NID with zero means and respective variances cr| 
and <j\. measures the failure of the regression line to go through the 
average F values (Fy.), while jrJ measures the fluctuation of individual 
values Y^ about their means Fy.. 

The analysis of variance of the residuals will have the following appear¬ 
ance: 


Error 

Degrees of 
freedom 

Mean 

square 

E(MS) 

Deviations from regression 

n — r — 1 

v e 

cr 2 = <y\ + p(X 2 e 

Observations 

n(p - 1) 

V d 



In this case the experimenter can determine whether he should add more 
sets of X’s or collect more values of F for each set, basing his conclusions 
on the relative size of <j\ and <j\ as compared with the relative costs of 
obtaining more sets or more observations per set. 

Very few examples are available of planned experiments with the same 
number of observations per set (other than one observation). In many 
cases, there will be experiments with several values of F for a given X x , 
but the values of X 2 , X 3 , . . . vary as well as F (see the vitamin B 2 
example in Chap. 15). Hence we are led to use the one-error regression 
equation. Another difficulty often arises, namely, that the observation 
error tends to increase with increasing F. This difficulty is not avoided 
when we use only one observation per set, but it is usually neglected 
(see Sec. 14.4). If several observations were taken for each set of X 
values, the experimenter would have some information as to the uni¬ 
formity of his variation as F increased. 

The factorial design is an example of several observations for each set of 
X’s; however, the X values are often qualitative rather than quantitative 
(different varieties, cultivation methods, teaching methods, etc.). 
When a factorial is used with different levels of the factors, the principle 
of multiple regression with several observations for each set of X values 
is being applied. However, there are seldom enough levels of the 
various factors to estimate the deviation error (for example, see Exercises 
20.4, 20.6, and 20.7). The article referred to in Exercise 20.11 was 
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based on estimating a quadratic regression of yield on planting date of 
cotton with 10 equally spaced planting dates. In this case = .3606 
and + pal = .3357, a value less than the estimate of a\. This is an 
all too common result—the estimate of a\ is negative. This indicates a 
need for a more thorough investigation for many regression problems 
of the accuracy of the assumptions of homogeneous variance from point 
to point and independence of the true residuals. 

Example 25.1. A survey was conducted in eastern North Carolina in 
1949 to estimate the relationship between per cent dry weight (F) of 
Irish potatoes (Cobbler variety) and X = 200 (specific gravity - 1.045). 
The values of X were 1, 0, 1, , 11, with 8 samples and 2 deter¬ 

minations per sample for many samples. If there had been potatoes 
for each of the 13 X classes for each sample and determination, there 
would have been 104 samples and 208 determinations. Unfortunately, 
there were few samples for X = — 1 and X = 11 and many blanks in 
other places, so that only 127 determinations were obtained. The data 
were as shown in Table 25.1. 


Table 25.1 








X 





f - . ' ft ■ ~ 

Sample 


-- 











0 

1 

2 

3 

4 

5 

6t 

7 

8 

9 

10 11 

1 

14.34 

14.73 

16.30 

17.46 

18.82 

19.48 

20.40 

20.67 

21.75 




14.66 

14.87 

16.79 

17.23 

17.68 

19.16 

19.90 

21.27 

22.70 



2 

14.12 

15.04 

16.50 

16.75 

17.90 

18.91 

20.35 

20.72 

22.28 

23.45 

24.98 


14.82 

15.88 

16.43 

17.08 

18.39 


19.32 

20.89 

22.28 23.31 

24.53 








19.68 





3 

13.58 

15.48 

15.80 

17.38 

17.86 

18.81 

19.68 

20.78 21.83 

23.50 

24.60 


13.68 

14.07 

15.60 

16.80 

18.30 

18.84 

19.90 

20.86 

22.17 

23.39 

24.80 

4 

13.83 

15.04 

16.17 

17.63 

17.90 

19.15 

20.11 

20.69 





13.92 

15.36 

16.10 

16.68 

17.88 

20.18 21.14 

21.05 




5 

13.53 

14.34 

15.60 

16.78 

17.66 


19.08 

20.88 

21.71 

23.25 

24.80 

6 


15.30 

15.60 

16.63 

17.98 


19.38 

20.56 

22.78 


24.80 

7 

14.09 

14.10 

15.50 

16.48 

17.63 


19.10 

20.34 

21.71 

22.73 

23.60 25.50 

8J 

13.86 

14.63 

15.14 

16.49 

17.10 

18.51 

19.18 

20.26 

21.44 

22.43 



13.61 

14.63 

14.90 

16.27 

17.20 

17.97 

19.08 

20.80 

21.64 




f Three determinations on the second sample. 

X There was 1 determination = 13.05 for X = —1. 
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The analysis for the third sample is as follows: 


Source of variation 

Degrees of 
freedom 

Mean 

square 

X(MS) 

Regression 

1 

253.593 


Deviations 

9 

.1086 

v'd + 

Determinations 

11 

.1269 

*1 


Again we note a negative estimate of <j\ . For the 5 samples with duplicate 
determinations, 2 had negative estimates of <r|, 2 had positive estimates, 
and 1 was almost zero (slightly positive). 

An over-all regression analysis for all 8 samples gave = .1362 and 
a\ + = .1804. The value of X is somewhere between 1 and 2. A 

rough approximation is the number of determinations divided by the 
number of samples = 1 rw~ = 1-6. 

Example 25.2. In Example 13.1, we used the total error sum of squares 
to estimate the error mean square to test whether or not /3 was zero. If 
we disregard the fact that the error variance tended to increase with 
increasing X, we could subdivide the error sum of squares into two parts 
as follows: 


Error 

Degrees of 
freedom 

Sum of 
squares 

Mean 

square 

.E'(MS) 

Deviations from regression 

47 

407 

8.66 

4 + 

Between families for a given X J 

860 

1,139 

1.32 

*1 


f X is the average number of families for each X. 

A rough approximation for X would be -% 9 - = 18.6. A more accurate 
estimate of X is given by use of the formulas given in Sec. 22.4 (see 
Exercise 22.18). 


where the n* are the number of families for X = Xi (see Table 13.1). 
In this case ^ nf = 33,161 and 

X = ^ (909 - 36.5) = = 18.6, 

which is the same as above (to one decimal place). 
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If Z ~ log (Y + 1) is used in the regression equation, the error splits 
up as follows: 


1 

Error 

Degrees of 
freedom 

Sum of 
squares 

Mean 

square 

Deviations from regression 

47 

2.121 

.0451 

Between families for a given X 

860 

22.429 

.0261 


In this case we have reduced the a; part substantially by making the 
more nearly equal. But notice what happens for the incomes less than 
14,000 when a log transformation is used. In this case u\ = .0214, and 
<?d + = -0233, indicating that u\ is practically zero. It would 

appear that extremely large estimates of <j\ may be caused by unequal 
sampling variability along the regression line. Some theoretical study 
should be made on this point. 

25.3. Other Topics in Relation to Variance Components. The indus¬ 
trial and engineering research men have need for variance components in 
studying the precision of measurements. This topic is still in the explora¬ 
tory stage and is too complicated to be discussed here. If the reader is 
interested in precision of measurements, he is advised to read references 
2 to 4. Cameron 5 considers the use of variance components in deter¬ 
mining the precision of estimating the clean content of wool, where 
several cores are taken from each of a number of bales of wool (see 
Exercises 25.4 and 25.5 for his technique). 

Tukey 6 considers the problems of estimating regression coefficients and 
variance components when both variates (X and Y) are subject to error, 
that is, each observed quantity = (steady part) -f (fluctuation). In 
most practical cases, a third variate (an instrumental variate ) with special 
properties is required. These special properties are that its covariance 
with the fluctuations in both X and Y vanish, but its covariance with the 
steady parts shall not vanish. The three fields of application where 
these problems have been most prevalent are indicated to be precision 
of measurements, psychology, and econometrics. We have not dis¬ 
cussed problems of this nature in this book, because they are too com¬ 
plicated for an elementary treatment. However, the reader is advised 
to read Tukey’s article at least to obtain some insight into the nature 
of the problem. He provides a list of references, which should also be 
useful. 

Some recent articles have been written on the power of current tests 
of significance for variance components (see, for example, references 7 
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and 8). Cochran 9 also discusses the power of his F" statistic, mentioned 
in Chap. 23. 

25.4. Summary of Needed Research. In these last four chapters, 
we have presented a rather extensive treatment of a relatively new 
statistical tool—components of variance. However, this is actually 
the basis of all statistical concepts, because it deals with that particular 
aspect of data which requires statistical treatment— variability. We 
have here a method of identifying separate sources of variability and 
using estimates of these variances to plan future experiments, to make 
tests of significance, and to set confidence limits on a treatment or a 
general mean. 

Since this statistical tool is so new, adequate statistical theory has 
not been developed to apply it to all problems where it should be applied. 
Some of the problems in variance component analysis which should be 
studied are: 

(1) In a randomized complete-blocks model with all effects random 
except the mean, the assumption of independence of the main effects 
and interactions frequently does not appear to be justified. The assump¬ 
tion of additivity needs to be explored in detail, especially with inter¬ 
actions and main effects. Why do large main effects more often produce 
significant interactions than do small main effects? Can anything be 
done to correct this difficulty? 

(2) A clear statement is needed of how to determine whether the 
interaction in a two-way classification is fixed or random. 

(3) A better method of handling finite populations is needed. 

(4) More exact methods of assigning confidence limits for variance 
components should be developed. 

(5) Better agreement is needed on how to handle the mixed model, 
which probably is the most important of all the models. 

(6) How can we detect correlated errors? And if we find that the 
errors are correlated, what should be done? 

(7) A study needs to be made of the efficiency of various methods of 
analyzing multiple classifications with unequal numbers in the sub¬ 
classes. Also, short-cut computing techniques are needed, especially 
some which can be used with card-punching and electronic equipment. 

(8) What can be done to simplify the analysis of data with unequal 
variances? 

(9) What are the effects of various types of nonnormality on the con¬ 
sistency and efficiency of estimates? 

(10) What is the best test in a mixed model where the error must be 
estimated from several mean squares? 
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(11) Some study needs to be made of the proper allocation of samples 
in a nested sampling problem with limited resources and a need for good 
estimates of all components (see, for example, Exercise 22.19). 

EXERCISES 

25.1. (a) Prove Cochran’s results for the use of variance components 
in a covariance analysis, and apply them to Exercise 21.5. 

( b ) Derive the same results for a randomized complete-blocks experi¬ 
ment. 

25.2. (a) In Example 25.1, determine the value of X by use of the 
regression model, as was done in Example 25.2. 

(b ) Estimate a 2 for the other four samples with duplicate determina¬ 
tions. 

(c) Show how the over-all estimates of c\ and \c J were obtained. 

(d) What is the proper error term to use in testing for the significance 
of a single regression line? 

(e) How would you set up a computing procedure to test the over-all 
regression and the deviations of sample regressions from the oyer-all 
regression? 

(/) Is there much evidence that o-\ was not constant from one sample 
to another? 

25.3. (a) In Example 13.1, assuming c\ constant from X to X, what 
is the expectation of SSR? Hence, how should you test the hypothesis 
that 0 = 0? 

(i b ) If a weighted analysis is used, what happens to the expectations 
of the mean squares? 

(c) On the basis of (5), what advantages do you see in the use of a 
transformation instead of a weighted ahalysis, if the transformation 
can make the variances reasonably stable? 

25.4. Given the model 

Yijk — m + oii + 1 8ij + yah, 

where a* represents the fth lot, 0$ the jth bale in the ith lot, and y#* 
the kth core in the (ij) bale. We shall assume all effects, except ii, are 
NID with respective variances cl, a 2 , and a 2 . Assume we have 2 M 
lots with n bales per lot and ki cores per bale for M lots and k 2 cores per 
bale for the other M lots. All cores for each lot are composited. 

(a) Show that the variances per lot mean are 
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(b) Given that Vi is the mean square between the M lots with nki 
cores per lot and V 2 for the other M lots. Show that, if k 2 > k 1} 


(nv* a + a*) = (k 2 V 2 - k x V x ). 

(c) How could you design a sample to estimate both of and of? 

25.5. Use the same model as in Exercise 25.4. Assume that bales 
are selected from the fth of the first M lots with 2 cores per bale, 1 core 
from each bale being composited to give 2 composite samples per lot. 
Call the difference between these 2 samples d*. For the other M bales, 
2 n k bales are taken from each, and again 2 cores per bale; both cores 
from n k of the bales are composited, and similarly for the other n k bales, 
again giving 2 composite samples per lot. Call the difference between 
these two samples d k . 

(a) Show that <r 2 (d<) = 2trj/w,- and a\d k ) = (2of + <r*)/n k . 

(i b ) Show that 


<r 2 = — 




(c) What is the estimate of of? 

(d) What are the variances of the estimates of and of? 
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Explanation of the Tables 

la. Ordinates of the Normal Distribution. This table gives values of 


f{x) = 


■\/ 2 7T 


for values of x between 0 and 4 at intervals of .01. For negative values of x, one uses 
the relation f( — x) = f(x). 

16. Area under the Normal Curve, F(y). This table gives values of 


F(y) - 1 



e~^^ dx 

■y/ 2 tt 


for values of y between 0 and 3.5 at intervals of .01. For negative values of y, one 

uses the relation F{ — y) — 1 — F(y) — aj 2. , , 

Values of y corresponding to a few selected values of a are presented beneath the 
main table. These are useful in making tests of significance for a specified significance 
probability, a, or in setting up confidence intervals with confidence probability 
(1 — a), as in Sec. 7.2 for the statistic T a • _ . 

II Percentage Points of the x 2 Distribution. This table gives values of x« for selected 
values of « and for the number of degrees of freedom, r, ranging from 1 to 30, plus 
v = 40, 50, 60, where 


a 


L 


x« 2 rW2) 


® D “ 2)/2 a/n 7 
e -xVM 



For larger values of v, a normal approximation is quite accurate. The quantity 
\/2% z — v / 2if — i is nearly normally distributed with zero mean and unit variance. 
Hence x« ma y be computed as 

X« = H- 2 

where we use T, a because the probability for x 2 corresponds to a single tail of the 
normal curve (see Sec. 7.2 and Table 16). For example consider x .os for » - 40, 
The exact value is x 2 06 = 55.8, and the approximate value would be 

i 1(1.645 + a/79 ) 2 = 55.5. 

III. Percentage Points of the t Distribution. This table gives values of t a for selected 
values of « and for the following number of degrees of freedom, n: 1, 2, . . . , 30, 40, 
60, 120, oo, where 


a 




\/%n r (n/2) 


/ /2\-(n+l)/2 

( 1+ L) d. 
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If a percentage point is needed for some n not given in the table, linear interpolation 
can be used between tabulated percentage points, but the reciprocals of n should be 
used. For example, the value of t.o 5 for 50 degrees of freedom would be 

2.021 - y ~ 7 (.021) = 2.008. 

4 ^ 6 0 

IV. Percentage Points of the F Distribution . This table gives values of F a for 
“ = .01 and .05; Wl = 1 (1) 10, 12, 16, 20, 24, 30, 50, 100, co; and n 2 = 1 (1) 20 (2) 28 
(4) 40, 60, 100, 200, 00. F a is defined as follows: 


Jf„ b (|‘,|) ^ n '- ' 


F ' s i/ s 2> where is a sample variance with n\ degrees of freedom and a sample 
variance with n 2 degrees of freedom. The table also can be used to determine Fi_ a 
by use of the formula 


F a-cc){ni,n 2 ) 


Fa(n 2 ,ni ) 3 


where the first n indicates the number of degrees of freedom in the numerator and the 
second n the number of degrees of freedom in the denominator. For example 


F. 96 (3,75) 


F. 06 (75,3) 8.57' 


One should interpolate on the reciprocals of m and n 2 as in Table III. 
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.00 


.3989 

.3970 

.3910 

.3814 

.3683 

.3521 

.3332 

.3123 

.2897 

.2661 


1.0 

1.1 

1.2 

1.3 

1.4 


.2420 

.2179 

.1942 

.1714 

.1497 


1.5 

1.6 

1.7 

1.8 
1.9 


.1295 

.1109 

.0940 

.0790 

.0656 


2.0 

2.1 

2.2 

2.3 

2.4 


.0540 

.0440 

.0355 

.0283 

.0224 


2.5 

2.6 

2.7 

2.8 

2.9 


.0175 

.0136 

.0104 

.0079 

.0060 


3.0 

3.1 

3.2 

3.3 

3.4 


.0044 

.0033 

.0024 

.0017 

.0012 


3.5 

3.6 

3.7 

3.8 

3.9 


.0009 

.0006 

.0004 

.0003 

.0002 


Table la 


Ordinates of the Normal Distribution 


.01 

.02 

.03 

.04 

.05 

.06 

.07 

.08 

.09 

3989 

.3989 

.3988 

.3986 

.3984 

.3982 

.3980 

.3977 

.3973 

3965 

.3961 

.3956 

.3951 

.3945 

.3939 

.3932 

.3925 

.3918 

3902 

.3894 

.3885 

.3876 

.3867 

.3857 

.3847 

.3836 

.3825 

3802 

.3790 

.3778 

.3765 

.3752 

.3739 

.3725 

.3712 

.3697 

3668 

.3653 

.3637 

.3621 

.3605 

.3589 

.3572 

.3555 

.3538 

,3503 

.3485 

.3467 

.3448 

.3429 

.3410 

.3391 

.3372 

.3352 

3312 

.3292 

.3271 

.3251 

.3230 

.3209 

.3187 

.3166 

.3144 

3101 

. 3079 

.3056 

.3034 

.3011 

.2989 

.2966 

.2943 

.2920 

2874 

.2850 

.2827 

.2803 

.2780 

.2756 

.2732 

.2709 

.2685 

.2637 

.2613 

.2589 

.2565 

.2541 

.2516 

.2492 

.2468 

.2444 

2396 

.2371 

.2347 

.2323 

.2299 

.2275 

.2251 

.2227 

.2203 

2155 

.2131 

.2107 

.2083 

.2059 

.2036 

.2012 

.1989 

. 1965 

.1919 

.1895 

.1872 

.1849 

.1826 

.1804 

.1781 

.1758 

. 1736 

.1691 

.1669 

.1647 

.1626 

.1604 

.1582 

.1561 

.1539 

.1518 

.1476 

.1456 

.1435 

.1415 

.1394 

.1374 

.1354 

.1334 

.1315 

.1276 

.1257 

.1238 

.1219 

.1200 

.1182 

.1163 

.1145 

.1127 

.1092 

.1074 

.1057 

.1040 

.1023 

.1006 

.0989 

.0973 

.0957 

0925 

.0909 

.0893 

.0878 

.0863 

.0848 

.0833 

.0818 

.0804 

.0775 

.0761 

.0748 

.0734 

.0721 

.0707 

.0694 

.0681 

.0669 

.0644 

.0632, 

.0620 

.0608 

.0596 

.0584 

.0573 

.0562 

.0551 

.0529 

.0519 

.0508 

.0498 

.0488 

.0478 

.0468 

.0459 

.0449 

.0431 

.0422 

.0413 

.0404 

.0396 

.0387 

.0379 

.0371 

.0363 

.0347 

.0339 

.0332 

.0325 

.0317 

.031!) 

.0303 

.0297 

.0290 

.0277 

.0270 

.0264 

.0258 

.0252 

.0246 

.0241 

.0235 

.0229 

.0219 

.0213 

.0208 

.0203 

.0198 

.0194 

.0189 

.0184 

.0180 

.0171 

.0167 

.0163 

.0158 

.0154 

.0151 

.0147 

.0143 

.0139 

.0132 

.0129 

.0126 

.0122 

.0119 

.0116 

.0113 

.0110 

.0107 

.0101 

.0099 

.0096 

.0093 

.0091 

.0088 

.0086 

.0084 

.0081 

.0077 

.0075 

.0073 

.0071 

.0069 

.0067 

.0065 

.0063 

.0061 

.0058 

.0056 

.0055 

.0053 

.0051 

.0050 

.0048 

.0047 

.0046 

.0043 

.0042 

.0040 

.0039 

.0038 

.0037 

.0036 

.0035 

.0034 

.0032 

.0031 

.0030 

.0029 

.0028 

.0027 

.0026 

.0025 

.0025 

.0023 

.0022 

.0022 

.0021 

.0020 

.0020 

.0019 

.0018 

.0018 

.0017 

.0016 

.0016 

.0015 

.0015 

.0014 

.0014 

.0013 

.0013 

.0012 

.0012 

.0011 

.0011 

.0010 

.0010 

.0010 

.0009 

.0009 

.0008 

.0008 

.0008 

.0008 

.0007 

.0007 

.0007 

.0007 

.0006 

.0006 

.0006 

.0005 

.0005 

.0005 

.0005 

.0005 

.0005 

.0004 

.0004 

.0004 

.0004 

.0004 

.0004 

.0003 

.0003 

.0003 

.0003 

.0003 

.0003 

.0003 

.0003 

.0002 

.0002 

.0002 

.0002 

.0002 

.0002 

! .0002 

. _ 

: .0002 

.0002 

.0002 

.0002 

.0002 

.0001 

.0001 







Table 16 


Area under the Normal Curve, F{y) f 


y 

.00 

.01 

.02 

.03 

.04 

.05 

.06 

.07 

.08 

.09 

.0 

.5000 

.5040 

.5080 

.5120 

.5160 

.5199 

.5239 

.5279 

.5319 

.5359 

.1 

.5398 

.5438 

.5478 

.5517 

.5557 

.5596 

.5636 

.5675 

.5714 

.5753 

.2 

.5793 

.5832 

.5871 

.5910 

.5948 

.5987 

.6026 

.6064 

.6103 

.6141 

.3 

.6179 

.6217 

.6255 

.6293 

.6331 

.6368 

.6406 

.6443 

.6480 

.6517 

A 

.6554 

.6591 

.6628 

.6664 

.6700 

.6736 

.6772 

.6808 

.6844 

.6879 

.5 

.6915 

.6950 

.6985 

.7019 

.7054 

.7088 

.7123 

.7157 

.7190 

.7224 

.6 

.7257 

.7291 

.7324 

.7357 

.7389 

.7422 

.7454 

.7486 

.7517 

.7549 

.7 

.7580 

.7611 

.7642 

.7673 

.7704 

.7734 

.7764 

.7794 

.7823 

.7852 

.8 

.7881 

.7910 

.7939 

.7967 

.7995 

.8023 

.8051 

.8078 

.8106 

.8133 

.9 

.8159 

.8186 

.8212 

.8238 

.8264 

.8289 

.8315 

.8340 

.8365 

.8389 

1.0 

.8413 

.8438 

.8461 

.8485 

.8508 

.8531 

.8554 

.8577 

.8599 

.8621 

1.1 

.8643 

.8665 

.8686 

.8708 

.8729 

.8749 

.8770 

.8790 

.8810 

.8830 

1.2 

.8849 

.8869 

.8888 

.8907 

.8925 

.8944 

.8962 

.8980 

.8997 

.9015 

1.3 

.9032 

.9049 

.9066 

.9082 

.9099 

.9115 

.9131 

.9147 

.9162 

.9177 

1.4 

.9192 

.9207 

.9222 

.9236 

.9251 

.9265 

.9279 

.9292 

.9306 

.9319 

1.5 

.9332 

.9345 

.9357 

.9370 

.9382 

.9394 

.9406 

.9418 

.9429 

.9441 

1.6 

.9452 

.9463 

.9474 

.9484 

.9495 

.9505 

.9515 

.9525 

.9535 

.9545 

1.7 

.9554 

.9564 

.9573 

.9582 

.9591 

.9599 

.9608 

.9616 

.9625 

.9633 

1.8 

.9641 

.9649 

.9656 

.9664 

.9671 

.9678 

.9686 

.9693 

.9699 

.9706 

1.9 

.9713 

.9719 

.9726 

.9732 

.9738 

.9744 

.9750 

.9756 

.9761 

.9767 

2.0 

.9772 

.9778 

.9783 

.9788 

.9793 

.9798 

.9803 

.9808 

.9812 

.9817 

2.1 

.9821 

.9826 

.9830 

.9834 

.9838 

.9842 

.9846 

.9850 

.9854 

.9857 

2.2 

.9861 

.9864 

.9868 

.9871 

.9875 

.9878 

.9881 

.9884 

.9887 

.9890 

2.3 

.9893 

.9896 

.9898 

.9901 

.9904 

.9906 

.9909 

.9911 

.9913 

.9916 

2.4 

.9918 

.9920 

.9922 

.9925 

.9927 

.9929 

.9931 

.9932 

.9934 

.9936 

2.5 

.9938 

.9940 

.9941 

.9943 

.9945 

.9946 

.9948 

.9949 

.9951 

.9952 

2.6 

.9953 

.9955 

.9956 

.9957 

.9959 

.9960 

.9961 

.9962 

.9963 

.9964 

2.7 

.9965 

.9966 

.9967 

.9968 

.9969 

.9970 

.9971 

.9972 

.9973 

.9974 

2.8 

.9974 

.9975 

.9976 

.9977 

.9977 

.9978 

.9979 

.9979 

.9980 

.9981 

2.9 

.9981 

.9982 

.9982 

.9983 

.9984 

.9984 

.9985 

.9985 

.9986 

.9986 

3.0 

.9987 

.9987 

.9987 

.9988 

.9988 

.9989 

.9989 

.9989 

.9990 

.9990 

3.1 

.9990 

.9991 

.9991 

.9991 

.9992 

.9992 

.9992 

.9992 

.9993 

.9993 

3.2 

.9993 

.9993 

.9994 

.9994 

.9994 

.9994 

.9994 

.9995 

.9995 

.9995 

3.3 

.9995 

.9995 

.9995 

.9996 

.9996 

.9996 

.9996 

.9996 

.9996 

.9997 

3.4 

.9997 

.9997 

.9997 

.9997 

.9997 

.9997 

.9997 

.9997 

.9997 

.9998 


Percentage Points of the Normal Distribution f 


F(y) 

.75 

.90 

.95 

.975 

.99 

.995 

.999 

.9995 

.99995 

.999995 

« = 2[1 - F(y)] 

.50 

.20 

.10 

.05 

.02 

.01 

.002 

.001 

.0001 

.00001 

Ta 

0.674 

1.282 

1.645 

1.960 

2.326 

2.576 

3.090 

3.291 

3.891 

4.417 


f F(y) is the area under the normal curve from - » to y; a is twice the area from y 
to oo (area from - oo to -y plus the area from y to oc). 
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Percentage Points of the x 2 Distribution t 



383 











APPENDIX 


385 


Table III 

Percentage Points of the t Distribution f 


\. a 

n n . 

.50 

.40 

.30 

.20 

.10 

.05 .02 

.01 

.001 

1 

1.000 

1.376 

1.963 

3.078 

6.314 

12.706 31.821 

63.657 

636.619 

2 

.816 

1.061 

1.386 

1.886 

2.920 

4.303 6.965 

9.925 

31.598 

3 

4 

.765 

.978 

1.250 

1.638 

2.353 

3.182 4.541 

5.841 

12.941 

.741 

.941 

1.190 

1.533 

2.132 

2.776 3.747 

4.604 

8.610 

5 

.727 

.920 

1.156 

1.476 

2.015 

2.571 3.365 

4.032 

6.859 

6 

.718 

.906 

1.134 

1.440 

1.943 

2.447 3.143 

3.707 

5.959 

7 

.711 

.896 

1.119 

1.415 

1.895 

2.365 2.998 

3.499 

5.405 

8 

.706 

.889 

1.108 

1.397 

1.860 

2.306 2.896 

3.355 

5.041 

9 

.703 

.883 

1.100 

1.383 

1.833 

2.262 2.821 

3.250 

4.781 

10 

.700 

.879 

1.093 

1.372 

1.812 

2.228 2.764 

3.169 

4.587 

11 

.697 

.876 

1.088 

1.363 

1.796 

2.201 2.718 

3.106 

4.437 

12 

.695 

.873 

1.083 

1.356 

1.782 

2.179 2.681 

3.055 

4.318 

13 

.694 

.870 

1.079 

1.350 

1.771 

2.160 2.650 

3.012 

4.221 

14 

.692 

.868 

1.076 

1.345 

1.761 

2.145 2.624 

2.977 

4.140 

15 

.691 

.866 

1.074 

1.341 

1.753 

2.131 2.602 

2.947 

4.073 

16 

.690 

.865 

1.071 

1.337 

1.746 

2.120 2.583 

2.921 

4.015 

17 

.689 

.863 

1.069 

1.333 

1.740 

2.110 2.567 

2.898 

3.965 

18 

.688 

.862 

1.067 

1.330 

1.734 

2.101 2.552 

2.878 

3.922 

19 

.688 

.861 

1.066 

1.328 

1.729 

2.093 2.539 

2.861 

3.883 

20 

.687 

.860 

1.064 

1.325 

1.725 

2.086 2.528 

2.845 

3.850 

21 

.686 

.859 

1.063 

1.323 

1.721 

2.080 2.518 

2.831 

3.819 

22 

.686 

.858 

1.061 

1.321 

1.717 
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23 

.685 

.858 
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1.319 

1.714 
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2.807 

3.767 

24 

.685 

.857 
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1.318 

1.711 

2.064 2.492 

2.797 

3.745 

25 

.684 

.856 

1.058 

1.316 

1.708 

2.060 2.485 

2.787 

3.725 

26 

.684 

.856 

1.058 

1.315 

1.706 

2.056 2.479 

2.779 

3.707 

27 

.684 

.855 

1.057 

1.314 

1.703 

2.052 2.473 

2.771 

3.690 

28 

.683 

.855 

1.056 

1.313 

1.701 

2.048 2.467 

2.763 

3.674 

29 

.683 

.854 

1.055 

1.311 

1.699 

2.045 2.462 

2.756 

3.659 

30 

.683 

.854 

1.055 

1.310 

1.697 

2.042 2.457 

2.750 

3.646 

40 

.681 

.851 

1.050 

1.303 

1.684 

2.021 2.423 

2.704 

3.551 

60 

.679 

.848 

1.046 

1.296 

1.671 

2.000 2.390 

2.660 

3.460 

120 

.677 

.845 

1.041 

1.289 

1.658 

1.980 2.358 

2.617 

3.373 

00 

.674 

.842 

1.036 

1.282 

1.645 

1.960 2.326 

2.576 

3.291 


t This table is abridged from Table III of Statistical Tables for Biological, Agricul¬ 
tural and Medical Research, 3d ed., by R. A. Fisher and Fiank Yates, published by 
Oliver & Boyd, Ltd., London and Edinburgh, 1949. It is reproduced here with the 
kind permission of the authors and their publishers, n is the number of degrees of 
freedom and a twice the probability of exceeding the tabular value (or the probability 
of being more than the tabular value or less than the negative of the tabular value). 
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